4 Replies Latest reply on Nov 4, 2016 4:06 AM by magnusb

    RESP1 error codes

    mcmagnus

      Hello

       

      Is there a list of response codes sent from a BCM943341? Pretty often when the Bluetooth part is active and I also try to send data over WiFi, it ends up in the error handling part of the sdio_irq in wwd_SDIO.c. The bit 0x800 in the response (RESP1) from CMD53 seems to indicate an error, and then a failure is indicated back to the transfer function, which will retry 5 times. However, once it happens, it will never work again until I reset the system. I even added a 100ms delay before retrying and then retry up to 50 times without success.

       

      The error code (RESP1) I get is usually 0x2800, but I have also seen 0xD00 and 0x400D00.

       

      It always happens when Bluetooth is active, but also sometimes even without Bluetooth.

       

      I'm using SDK 3.5.2

       

      Any suggestions as to how to remedy the problem once it arises?

        • 1. Re: RESP1 error codes
          vens

          Hello, mcmagnus,

           

          1) Could you  share some more info on the sections of code where you are getting these response codes from.

          2) Also would be useful is a scenario where this can be reproduced easily. Are you running one of the example apps or is it your own application?

           

          Thanks,

          Venkat

          • 2. Re: RESP1 error codes
            mcmagnus

            Hi vens

             

            I did some git forensics over the weekend and found out a few interesting things.

             

            It mostly happens when I use an external PSRAM as the source of the DMA transfer. After some measurments, I realized that the external PSRAM is 5-6x slower than the internal RAM in the STM32F4 I use. Anyway, when I use the internal RAM it goes much better. However, it's still possible to reproduce the problem if I start a CPU and RAM intensive task (RTOS). It might take an hour or two, instead of just a minute if I use the PSRAM.

             

            The first sign of trouble is an interrupt from SDIO with the STA 0x400010 which is SDIOIT and TXUNDERR. So the problem seems to be that the FIFO becomes empty during a transfer. I found a thorough thread about this in Forum - SDIO+DMA FIFO Error (TX Underrun)... - STMicroelectronics

             

            I'll read that carefully some day and see if I can find a good solution.

             

            But as I said, if you want to reproduce it, you should be able to do it by sending lots of data over WiFi while at the same time running one or more CPU and RAM intensive tasks. I'm also using the DMA2D for color conversion and then sending a display buffer to a display over SPI with another DMA channel, but I'm not sure it's required.

            • 3. Re: RESP1 error codes
              mcmagnus

              A similar problem arised today. When the cmd sent to SDIO times out (after 64 SDIO clock cycles according to ST specs), the SDIO->STA register sets the flag CTIMEOUT. However, the check for CTIMEOUT in wwd_SDIO.c is never reached since the semaphore times out. Moving the check inside the semaphore error handling block fixes this problem.

               

              So instead of:

               

                      result = host_rtos_get_semaphore( &sdio_transfer_finished_semaphore, (uint32_t) 50, WICED_TRUE );

                      if ( result != WWD_SUCCESS )

                      {

                          goto exit;

                      }

               

               

                      if ( sdio_transfer_failed == WICED_TRUE )

                      {

                          goto restart;

                      }

               

               

                      /* Check if there were any SDIO errors */

                      if ( ( SDIO->STA & ( SDIO_STA_DTIMEOUT | SDIO_STA_CTIMEOUT ) ) != 0 )

                      {

                          goto restart;

                      }

               

              Change to:

               

                      result = host_rtos_get_semaphore( &sdio_transfer_finished_semaphore, (uint32_t) 50, WICED_TRUE );

                      if ( result != WWD_SUCCESS )

                      {

                          /* Check if there were any SDIO errors */

                          if ( ( SDIO->STA & ( SDIO_STA_DTIMEOUT | SDIO_STA_CTIMEOUT ) ) != 0 )

                          {

                              goto restart;

                          }

               

                          goto exit;

                      }

               

               

                      if ( sdio_transfer_failed == WICED_TRUE )

                      {

                          goto restart;

                      }

               

               

              ... and all goes well.

              • 4. Re: RESP1 error codes
                magnusb

                It seems I got to the bottom of this one now.

                 

                As I indicated above, to reproduce it, you need to send data pretty fast over WiFi while at the same time exercise DMA2D running a color conversion and then sending the color converted 565 bitmap to an LCD over SPI using DMA. The DMA2D fetches the data from FMC/PSRAM and can either store the converted data into internal SRAM or back to external PSRAM, I've tried both but both trigger the crash.

                 

                I'm thinking the Bus Matrix of the STM32F439 I'm using has something to do with it. I saw the diagram on page 62 of the "RM0090 Ref manual" and noticed that the DMA2D can't access the same SRAM bank as the DMA which is sending data from SRAM to the WiFi chip over SDIO. However, there's something else going on because I tried to set the DeadTime of the DMA2D transfer to 1 or 4 to give the DMA enough time to fill up the FIFO buffer, but it's still indicating TXUNDERR now and then.

                 

                The only working solution I've come up with is a mutex disabling simultaneous operation from LCD/DMA2D and the SDIO DMA.

                 

                I also made some timings using the DWT->CYCCNT and it really is taking a very long time to trigger the TXUNDERR (the SDIO->DTIMER is set to FFFFFFFF). Normally, it takes around 830 cycles from sending the CMD to the triggering of the sdio_irq function and I can start the DMA transfer. However, when the bug hits, the CYCCNT have almost flipped around a full cycle (i.e. 0xFFFFFFFF - ~5.000.000 which is around ... 25 seconds at 168MHz??) But that doesn't make sense at all! Is there a STM bug buried here?

                 

                Another thing, I make pretty many measurements when starting SDIO transfers, and what I can see, between the last working transfer and the one that fails, the RTC doesn't move one millisecond (I use 1/256 s resolution), but the CYCCNT increases ~5.000.000 which should be ~30ms! It really looks like the SDIO CMD is started when the CYCCNT shows ~5.000.000 too much, and then it suddenly jumps back to the correct time, which in turn confuses the SDIO, thinking it had timed out...

                 

                Seriously, *is* there a STM bug here?

                 

                Some logs:

                 

                These lists show the last few SDIO transfers before a failed one. It shows:

                Time of the printout, Thread, SDIO transfer num, RTC time of SDIO transfer, data, size, dir, startCYCCNT, dmaCYCCNT when DMA is started, the deltaCYCCNT between the two and the delta time to the error detection (when TXUNDERR has triggered an irq).

                 

                20:30:11.885 WWD: 416132: 20:30:11.771 data 20029d08, size 4, dir 0, sC 1518222439, dmaC 1518223272, dC 833, errC 0

                20:30:11.889 WWD: 416133: 20:30:11.771 data 2000aa7c, size 1532, dir 1, sC 1518229091, dmaC 1518229926, dC 835, errC 0

                20:30:11.889 WWD: 416134: 20:30:11.771 data 2001303c, size 1532, dir 1, sC 1518259755, dmaC 1518260588, dC 833, errC 0

                20:30:11.889 WWD: 416135: 20:30:11.771 data 20009d1c, size 1532, dir 1, sC 1522518667, dmaC 1518291106, dC -4227561, errC -4227196

                 

                another one:

                21:02:12.705 WWD: 280156: 21:02:12.430 data 2000890c, size 1532, dir 1, sC 1366818277, dmaC 1366819110, dC 833, errC 0

                21:02:12.705 WWD: 280157: 21:02:12.430 data 2000ed5c, size 1532, dir 1, sC 1366848793, dmaC 1366849626, dC 833, errC 0

                21:02:12.709 WWD: 280158: 21:02:12.430 data 2000f40c, size 1532, dir 1, sC 1366879317, dmaC 1366880150, dC 833, errC 0

                21:02:12.709 WWD: 280159: 21:02:12.430 data 20011c2c, size 1532, dir 1, sC 1372087154, dmaC 1366910688, dC -5176466, errC -5175859

                 

                A careful look reveals that the DMA transfer of the failed one *is* actually started, but when the CYCCNT has reverted back to the correct value.

                 

                Let me know if you need more info about how to read the logs.

                 

                Any insights?

                1 of 1 people found this helpful