6 Replies Latest reply on Mar 14, 2019 5:19 AM by NiMc_1688136

    CYW43907 Reset cause register explaination

    NiMc_1688136

      In the CYW43907, the appscr4_saved_core_status contains bits relating to the reset cause of the processor:

       

      s_error_log

      s_bp_reset_log

      force_proc_reset_log

       

      Does anyone know the details of each reset flag and what will cause them to be set?

       

      While my boards runs over time it will randomly get a reset and I am trying to track down whether it is a random exception or if it is noise on the reset line or some other issue. I am not able to monitor the serial port on all devices while running in a debug build to see a exception so i am trying to find as many clues as i can.

        • 1. Re: CYW43907 Reset cause register explaination
          PriyaM_16

          We are aware of the issues you are facing due to random resets in your board. Are you using secure_sflash or xip in your design?

          • 2. Re: CYW43907 Reset cause register explaination
            NiMc_1688136

            Secure_sflash or xip option are not used in the application.

             

            The resets come at random times; the board could run for hours or days. I am not sure if there is an assert/sw reset that gets triggered at some point or if this is a noise issue on the RESET_N line. I have seen the board randomly reset when a programming cable remains connected to the board and caught what appeared to be noise on the reset line but normal behavior is not to have any cables attached.

             

            Speaking of the reset line, we have no external components on the line and trace is mostly on an inner layer.

            • 3. Re: CYW43907 Reset cause register explaination
              NiMc_1688136

              PriyaM_16

              While running in debug i was able to capture one of these exceptions....

               

              Exception = data_abort_handler

              "2/1/2019 4:59:55 PM",data_abort_handler

              "2/1/2019 4:59:55 PM",DFSR : 0x00001C06

              "2/1/2019 4:59:55 PM",DFAR : 0x00000000

              "2/1/2019 4:59:55 PM",IFSR : 0x00000000

              "2/1/2019 4:59:55 PM",IFAR : 0x00000000

              "2/1/2019 4:59:55 PM",CPSR : 0x00000197

              "2/1/2019 4:59:55 PM",R0   : 0x00546148

              "2/1/2019 4:59:55 PM",R1   : 0x00005249

              "2/1/2019 4:59:55 PM",R2   : 0x00005248

              "2/1/2019 4:59:55 PM",R3   : 0x005EDDA0

              "2/1/2019 4:59:55 PM",R4   : 0x04040404

              "2/1/2019 4:59:55 PM",R5   : 0x05050505

              "2/1/2019 4:59:55 PM",R6   : 0x06060606

              "2/1/2019 4:59:55 PM",R7   : 0x005F14D0

              "2/1/2019 4:59:55 PM",R8   : 0x08080808

              "2/1/2019 4:59:55 PM",R9   : 0x09090909

              "2/1/2019 4:59:55 PM",R10  : 0x10101010

              "2/1/2019 4:59:55 PM",R11  : 0x11111111

              "2/1/2019 4:59:55 PM",R12  : 0x00000308

              "2/1/2019 4:59:55 PM",LR   : 0x004CD5DA

              status = CR4_FAULT_STATUS_ASYNC_EXTERNAL_ABORT_AXI_SLAVE_ERROR

               

               

              Every time i capture the exception, it is related to the same function and thread. Basically I have a gatekeeper thread that manages access to the console output. The gate keeper thread uses malloc when passing the string to the queue and free once the string has been received and send out the serial port. The gate keeper thread is a copy of the aws_logging_task_dynamic buffers file provided in the AWS FreeRTOS libraries.

               

              I am encountering the error in free. I am using FreeRTOS Heap_3 configuration which provides a mutex around free. It is when FreeRTOS tries to perform xSemaphoreGiveRecursive in __malloc_unlock that the exception triggers. The newlib_malloc_mutex pointer is valid and the original data pointer of the string is valid.

               

              The exception is reproducible on my side but appears to be random.

               

              Do you have any suggestions in debugging a AXI slave error or what it means?

              • 4. Re: CYW43907 Reset cause register explaination
                decac_1684766

                A description of the DFSR (Data Fault Status Register) is here http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0363e/BGBEDEIF.html for the R4.

                 

                I have hit this kind of error (a lot...) when I either accidentally free something multiple times or I have init'd a mutex on a stack and then forget to de-init.  This corrupts the mutex linked list and then the first time something else accesses the mutex list it blows up.

                1 of 1 people found this helpful
                • 5. Re: CYW43907 Reset cause register explaination
                  NiMc_1688136

                  I am still receiving this issue and the exception is always related to the free function, from the same thread.

                   

                  FreeRTOS is using heap3 so malloc/free is protected. I have tested with the WICED modification of a recursive mutex and also the default FreeRTOS implementation (stops the scheduler).

                   

                  From what I can tell it runs fine for a while, the pointers are all valid and then it randomly breaks. I do not know if this is something related to an advanced feature of the CYW43907 like memory buses or cache. It seems like the problem happens less often if i add code prior to vPortFree being called.

                   

                  The exception is data_abort, the LR always points to some statement in Free() ( seems different from time to time based on the disassembly window), and the status shows CR4_FAULT_STATUS_ASYNC_EXTERNAL_ABORT_AXI_SLAVE_ERROR.

                   

                  I am positive that the pointer has not been previously free'd.

                   

                  Consider the following code which is a UDP logging thread that pulls a pointer from the queue, acts on the pointer (sends data over UDP) and then free's the pointer.

                  Note, the exception will still periodically trigger is txDebugSocket_UDP is removed and the pointer + data look fine prior to calling vPortFree.

                   

                      for( ;; )
                      {
                          /* Block to wait for the next string to print. */
                           if(udpEnabled)
                           {
                                 if( xQueueReceive( UDP_Queue, &ptrMsg, portMAX_DELAY ) == pdPASS )
                                 {
                                      txDebugSocket_UDP( (char*)ptrMsg->data, ptrMsg->size, ptrMsg->port );
                                      lastAddress = uxTaskGetStackHighWaterMark(NULL);
                                      vPortFree( ( void * ) ptrMsg );
                                 }
                            }
                           else
                           {
                                wiced_rtos_delay_milliseconds( 1000 );
                           }
                      }
                  
                  

                   

                   

                  I have checked the stack watermark of the UDP log thread and it is good. I have not checked the stack of any other threads. I guess the stack of other threads could overrun corrupting other data in the heap.

                   

                  I have added watermark checks on all my app threads and also check to make sure the main stack does not overflow. No issues during exception.

                   

                  One exception triggered after free returned and fired in the return code of the vPortFree call.

                  • 6. Re: CYW43907 Reset cause register explaination
                    NiMc_1688136

                    Could there be an issue running with a debugger relating to an internal bus?

                     

                     

                    After many changes and still having issues, during my last testing it appeared the thread's stack or stack pointer value became corrupted based on the call tree in the debugger window and a failure in the return from vPortFree.

                     

                    As I removed code line by line I found the issue was related to snprintf

                     

                    dbgPkt.size += snprintf( (char*)dbgPkt.data+dbgPkt.size, MAX_PACKET_SIZE, "%llu,",( long long unsigned int ) time );
                    
                    
                    

                     

                    After looking at it i realize i need to subtract the current size from the max size for the second parameter but it shouldn't matter as in this case the dbgPkt.size was 11 and the buffer (MAX_PACKET_SIZE) is 128. There is no way this should produce an overrun unless there is an issue in snprintf?

                     

                    I changed to

                    dbgPkt.size += snprintf( (char*)dbgPkt.data+dbgPkt.size, (MAX_PACKET_SIZE-dbgPkt.size), "%llu,",( long long unsigned int ) time );
                    

                     

                    I was able to run 11 hours without an exception  (debugger was disconnected)...