7 Replies Latest reply on Aug 18, 2017 11:02 AM by srdr

    FX3: lose data from GPIF-II to USB 3.0 even with 2 threads

    andrie_2270216

      Dear all,

         

      I am currently testing data transfer with FX3, from GPIF2 to USB 3.0. I took the project GpifToUsb as basis. Since I would like to have a clock of 32 MHz, I reduced the PCLK clock (clkdiv=12 instead of 4) and I work with Streamer C# (modified version to do datalogging).

         

      First, I noticed that 4 buffers of 32kB do not work properly. The throughput is very low. To solve this problem, I reduced the buffers to 4x8kB and I got almost the right throughput (32bits x 32MHz -> 128 MB/s, I got about 120-125).

         

      So, my first question is: why does the system does not work well with large buffers and work well with small buffers? It is really weird because I think it should be the opposite...

         

      Then, tried to connect a 8 bits counter on 8 GPIF pins to see if I lost data. Because of latency and long wire (I am using a bread board), I had to put the PCLK to 16 MHz. Then, to have a throughput of about 64 MB/s, I had to select buffers of 4x2kB becasue 4x8kB gave me a very low throughput...

         

      So, with the original firmware, I lose 28 words of 32 bits every time a buffer is full. This seems to be normal according to the fact we have only 1 thread/socket and there is a latence while switching buffer (as explained in AN75779).

         

      So I made some modifications in order to use 2 sockets/threads to tranfert data from GPIF2 to FX3 memory. Here is the code using CyU3PDmaMultiChannelConfig_t dmaMultiCfg; (instead of CyU3PDmaChannelConfig_t dmaCfg;)

         

          dmaMultiCfg.size  = 2048;
          dmaMultiCfg.count = 4;
          dmaMultiCfg.validSckCount = 2;
          dmaMultiCfg.prodSckId [0] = (CyU3PDmaSocketId_t)CY_U3P_PIB_SOCKET_0;
          dmaMultiCfg.prodSckId [1] = (CyU3PDmaSocketId_t)CY_U3P_PIB_SOCKET_1;
          dmaMultiCfg.consSckId [0] = (CyU3PDmaSocketId_t)CY_FX_EP_CONSUMER_SOCKET;
          dmaMultiCfg.prodAvailCount = 0;
          dmaMultiCfg.dmaMode = CY_U3P_DMA_MODE_BYTE;
          dmaMultiCfg.prodHeader = 0;
          dmaMultiCfg.prodFooter = 0;
          dmaMultiCfg.consHeader = 0;
          dmaMultiCfg.prodAvailCount = 0;

         

          dmaMultiCfg.notification = CY_U3P_DMA_CB_CONS_SUSP;
          dmaMultiCfg.cb = GpifToUsbDmaMultiCallback;
          apiRetStatus = CyU3PDmaMultiChannelCreate (&glDmaMultiChHandle, CY_U3P_DMA_TYPE_AUTO_MANY_TO_ONE, &dmaMultiCfg);

         


          if (apiRetStatus != CY_U3P_SUCCESS)
          {
              CyU3PDebugPrint (4, "CyU3PDmaChannelCreate failed, Error code = %d\n", apiRetStatus);
              CyFxAppErrorHandler(apiRetStatus);
          }

         

          /* Set DMA Channel transfer size */
          apiRetStatus = CyU3PDmaMultiChannelSetXfer (&glDmaMultiChHandle, CY_FX_GPIFTOUSB_DMA_TX_SIZE, 0);
          if (apiRetStatus != CY_U3P_SUCCESS)
          {
              CyU3PDebugPrint (4, "CyU3PDmaChannelSetXfer failed, Error code = %d\n", apiRetStatus);
              CyFxAppErrorHandler(apiRetStatus);
          }

         

      As you can see, I basically modified the code the "DmaMulti" objects instead of "Dma". For the state machine, I did some adaptions. There is 2 threads and the 1st thread fill in the data until the !DMA_RDY_TH0 flag is set and switch to the 2nd thread until the !DMA_RDY_TH1 flag is set. Please see the attached file.

         

      With this configuration, I have better results but not perfect. Instead of loosing 28 words of 32 bits while the GPIFII switch the buffer, I lose only 1 word of 32 bits.

         

      So I have 2 issues:

         

      - Why do I have to decrease buffer size in order to have correct transfert?

         

      - Why do I loose 1 word of 32 bits when I use 2 threads?

         

      Thank you in advance,

         

      Best regards,

         

      Christian

        • 1. Re: FX3: lose data from GPIF-II to USB 3.0 even with 2 threads
          cngaoxf_2607281

          Hi!

             

          I also encounter the same problem, lost data! 
          My FPGA data rate is 44MBps(Byte), first I store the data into 2K depth FIFO(32bitswith), then I read the fifo by clock 90M to GPIF(32bits)and use the usb3014 dma thread to send data to usb bus. finally I tried to use pBulkEpIn->XferData(pBulkBuf, nBulkLen) to catch usb data, nBulkLen is 4*1024, when I use other nBulkLen , Ican't get data. The result is that I can only get almost 30MBps on PC

          • 2. Re: FX3: lose data from GPIF-II to USB 3.0 even with 2 threads
            andrie_2270216

            Hi!

               

            Glad to know that I am not the only one who encounter issues! With certain buffer sizes, I can get almost all my data but I lost 1 sample every time the DMA has to change buffer despite I use 2 DMA threads as suggested in the UVC example!

               

            I am back from vacations and I will try to find why! If anyone has a suggestion about my first problem (lost of 1 sample) or my second problem (same as Cngaoxf_2607281), it would be nice!

               

            Best,

               

            Christian

            • 3. Re: FX3: lose data from GPIF-II to USB 3.0 even with 2 threads
              andrie_2270216

              I implemented the error control for PIB that I found in AN65974 p20/68 and I got CYU3P_PIB_ERR_THR0_WR_OVERRUN and CYU3P_PIB_ERR_THR1_WR_OVERRUN or only one of them if I use only 1 thread.

                 

              How can I solve this problem?

                 

              I saw that I have to set DMA descriptors in the firmware, but I don't find any example even in the "cyfxbulklpautomanytoone" example... would it be the problem of loosing always 1 sample when I switch the DMA buffer?

                 

              EDIT: Ok, according to what I read, DMA descriptors are set when the DMA channel is created and the complexity is not seen by the user. Is it right?

                 

              Since I use the DMA_RDY_TH0/1 as flag to go to the next state machine state, I suspect a latency for the flag to be set. Then, the produced tries to fill a full buffer (that why I get a WRITE_OVERRUN error and I lost 1 sample) and the next buffer starts to be filled 1 clock to late.

                 

              About the throughput that goes very low when I choose a large DMA buffer, I saw on my oscilloscope that the flag DMA_RDY_TH0/1 is sometimes at low state with some 'peak' at high state (which mean the buffer is filled and then goes to high state because it as to be emptied by the consumer) which is fine... but after a few transfers, the same flag is stuck at high state which mean the buffer cannot be filled and that is why I got such a low throughput... I guess I need to fix it first before going further...

                 

              Does any Cypress employee could help me? It would be nice!

              • 4. Re: FX3: lose data from GPIF-II to USB 3.0 even with 2 threads
                srdr
                        1. As you know that throughout depends on various factors as said in AN86947 (http://www.cypress.com/documentation/application-notes/an86947-optimizing-usb-30-throughput-ez-usb-fx3) You told that the throughput is increased by reducing the buffer size. What is the burst size, packets per xfer and xfer per queue you have set with 4X32 KB Dma buffer size at 32 MHz PCLK? Have you maintained the same when you switched to 4X8 KB buffer size? With 4X32 KB buffer, have you checked the throughput by varying packets per xfer and xfer per queue values? If yes, is there any improvement. 2. Yes, you have found the answer for data loss of 1 word (32 bits)   
                • 5. Re: FX3: lose data from GPIF-II to USB 3.0 even with 2 threads
                  andrie_2270216

                  Dear Srdr,

                     

                  Thank you very much for your message!

                     

                  Yes, I know the throughout depends on many factors. In my test, I do not need to have the maximum

                     

                  throughout but I do need to get 64MB/s with 32 bits and 16 MHz or 128MB/s with 32 bits and 32 MHz

                     

                  because this is the ammount of data that I have to get from the GPIF-II.

                     

                  I can achieve almost this throughout depending of the parameters, I guess the difference is due to

                     

                  the loss data (the state machine transitions are made with DMA_RDY_THx flags). But with some

                     

                  parameters, the buffers are not emptied for long time that cause the throughout going down. I took

                     

                  some pictures of the oscilloscope screen to show you what I would like to say. The yellow trace is

                     

                  DMA_RDY_TH0 flag and the blue one is DMA_RDY_TH1 flag.

                     

                  Here are the results with Streamer (Packets per Xfers is set to 256 and Xfers to queue is set to

                     

                  64, I get almost the same results with lower values):

                     

                  16 MHz 32 bits: (theoretical throughput = 64MB/s)
                  4x2KB -> 62'300 KB/s (See picture 16MHz 4x2048), everything seems to be OK
                  4x4KB -> 9'000 KB/s (See pictures 16MHz 4x4096 and 16MHz 4x4096 zoom), everything is OK for some

                     

                  times and the buffers are stall for long time... Do you know why? Here is the cause of the very low

                     

                  throughput...)
                  4x8KB -> 12'200 KB/s (buffers issues)
                  4x16KB -> 18'000 KB/s (buffers issues)
                  4x32KB -> does not work

                     

                  32 MHz 32 bits: (theoretical throughput = 128MB/s)
                  4x2KB -> 124'700 KB/s (OK)
                  4x4KB -> 124'800 KB/s (OK)
                  4x8KB -> 124'900 KB/s (OK)
                  4x16KB -> 24'500 KB/s (buffers issues)
                  4x32KB -> does not work

                     

                  I can work with 4x32KB only with 1 thread but as soon as I try with 2 threads 'many-to-one', I can

                     

                  reach only 4x16KB.

                     

                  I hope you will be able to help me!

                     

                  Thank in advance.

                     

                  Christian

                     

                  • 6. Re: FX3: lose data from GPIF-II to USB 3.0 even with 2 threads
                    andrie_2270216

                    Dear Srdr,

                       

                    I answered to you but without using "reply button" (because there were a bug when I was doing this way). So I post this message to be sure you get a notification!

                       

                    Best,

                       

                    Christian

                    • 7. Re: FX3: lose data from GPIF-II to USB 3.0 even with 2 threads
                      srdr

                      EDITED:

                       

                      Christian,

                       

                      1. Lower Throughput with higher buffer size: - this is your first issue in this thread

                      I reproduced your test and can see the same results as you.

                      With the reduced the Clock frequency, 32 MHz,  32 KB buffer is taking time to full, till that time USB is waiting for the buffer.

                      If you decrease the buffer size to 8K and increase buffer count to 16 to maintain the same buffer space,You can achieve around 123 MB/s throughput.

                       

                      Here I used one to one channel only.

                       

                      dmaCfg.size  = 8192;// CY_FX_DMA_BUF_SIZE;

                      dmaCfg.count = 16;//_FX_DMA_BUF_COUNT;

                       

                      2. loosing one sample when there is buffer switch and getting OVERRUN Error:

                       

                      This may be due to socket switching. Is it possible to put a counter on external processor, so that it will stop driving data when the counter hit and wait for DMA FLag to dissented? This will avoid the data loss.

                       

                      3. Issue 4x32 KB Buffer size:

                       

                      Since you are using Many to one DMA Channel, and configured buffer size as 4X32 KB for each channel(1. Prod_0 to Cons_0 2. Prod_1 to Cons_0). Effectively, It needs 4X32KBX2 = 256 KB.

                      By default, 512 KB RAM Chips has only 224 KB allocated for buffer area. You can see this in cyfx.c file.