3 Replies Latest reply on Oct 3, 2015 6:53 AM by shatruddha

    WWD-Thread stalls

    natwati

      We are using a murata SN8000 module connected via SPI mode 0 operating with 25Mhz to a cortex m3 lpc1837. We are using a WWD ported from the SDK 3.1.2 (including matching patch from murata). We are using a custom  PCB with SPI Flash, SDRAM, MSU and SN8000. The MCU is operating with 180MHz. We are using an older version of freertos and LWIP (not taken from the SDK). The SPI interrupt has the second highest interrupt priority and SPI DMA has the highest interrupt priority. The wwd task has the highest priority as a task. There should be no long interrupt processing during the test.

       

      We have experienced rare cases where WLAN communication stops when continuously under heavy load.

       

      In some cases the driver thread stalls in the line "result = host_rtos_get_semaphore( &wwd_transceive_semaphore, (uint32_t) WWD_THREAD_POLL_TIMEOUT, WICED_FALSE );" of wwd_thread_func(). Interrupt line can be high or low in this case.

       

      In other cases the driver reported " Received a packet with a frametag which is wrong" or "gSPI underflow - packet size will be wrong".

       

       

      After a modification to wwd_thread's wwd_sdpcm_process_rx_packet() function which ensures the driver never runs out of RX buffers the problem occures less frequently (after 8 - 70 hours). We now simply drop data for network stack if too many packets are queued.

       

      Another test case that sends and receives ethernet frame without use of a network stack showed that the driver runs into errors after 3 to 10 minutes of operation when RX buffers are not freed.

       

      Is there any specification for how long RX buffers might be unavailable without interfering with operation?

      Are there any known issues when running out of RX buffers?

       

       

       

      For the cases in which wwd_thread stops in the line "result = host_rtos_get_semaphore( &wwd_transceive_semaphore, (uint32_t) WWD_THREAD_POLL_TIMEOUT, WICED_FALSE );" of wwd_thread_func() the comment for the functions wwd_thread_func() indicates that interrupts could be missed.

       

      > *  It simply calls @ref wwd_thread_poll_all to send/receive all waiting packets, then goes

      > *  to sleep.  The sleep has a 100ms timeout, causing the send/receive queues to be

      > *  checked 10 times per second in case an interrupt is missed.

       

      In SDK 3.3.1 the timeout became override able:

       

      #ifndef WWD_THREAD_POLL_TIMEOUT

      #define WWD_THREAD_POLL_TIMEOUT      (NEVER_TIMEOUT)

      #endif

       

      In some of the error cases we saw the interrupt line was still high when the bus was disabled and therefore an edge interrupt was never triggered again.

      In the other code path the 100ms timeout should prevent the problem.

      Is there a known problem with missed interrupts in the path where the bus is disabled?

        • 1. Re: WWD-Thread stalls
          shatruddha

          I'm having similar issue, where i suspect that wwd thread stalls. everything else is running.  Device goes in a deadlock and doesnot recover from it unless restarted physically.

          This problem occurs to me after hours of operation and device is under stress test for communication data. 

          • 2. Re: WWD-Thread stalls
            grga

            We will try to reproduce this in house. Any other data you have to reproduce it would be helpful.

            • 3. Re: WWD-Thread stalls
              shatruddha

              Similar issue occured to when I tried to start the device as softAP and it fails. on failure, I retry

              while (wiced_network_up(WICED_AP_INTERFACE,
              WICED_USE_INTERNAL_DHCP_SERVER, &ap_ip_settings)
              != WICED_SUCCESS) {
              MYLOG(("trying to start as access point"));
              wiced_rtos_delay_milliseconds(1000);
              }

               

              Here also it retires 6-7 times and than hangs. I faced this one a number of times, but never gave it any heed, now that I faced similar problem in networking thread, I recollected it being same issue.

              I think some queue (because i noticed this 6-7 number everytime) is getting overflowed and after that device stalls, I'm not sure though.