5 Replies Latest reply on Apr 16, 2011 11:40 PM by anandsrinivasana_

    Slave FIFO OUT throughput



      We have a design with an FX2 (CY68013A) and an FPGA. We use the Slave FIFO interface and are mainly interested in high OUT throughput.


      I currently have around 27 MB/s but the screamer example says it can do 32 MB/s. On the PC side I already have everything as in the screamer example, I think.


      I use a programmable flag for almost empty so I can keep SLRD active continuously for most of the transfer between FX2 and FPGA.


      I'm using 4x 512 Bulk mode on EP2 and EP6.


      I'm using an external IFCLK running at 40MHz.


      Where should I look for improvement? Internal instead of external IFCLK? Faster IFCLK?


      I also found out that when I use two devices the combined throughput can go up to 41 MB/s. Why can't a single FX2 get this bandwidth?


      Any help is appreciated,



        • 1. Re: Slave FIFO OUT throughput

          Are you saying your application says 27MB/s but when using Streamer you're able to see 32MB/s speed?


          When you say 40MHz external IFCLK. Are you streaming data all the time i.e. what is the throughput of data streaming at the GPIF interface?



          • 2. Re: Slave FIFO OUT throughput

             When I use Streamer with the Keil demoboard and CYStream.iic I see 32 MB/s.


            When I use my own application which sends 4 chunks of 16MB data I see 27 MB/s. I've set XferSize to 1MB and time 4 loops of 16 consequentive StartXfer calls + wait for 16 WaitXfer and FinishXfer calls. Pseudocode:

            for i=1 to 4     get start time     for j=1 to 16         StartXfer (1MB, overlapped)     for j=1 to 16         WaitXfer         FinishXfer     get end time accumulate transfer times
            • 3. Re: Slave FIFO OUT throughput

              Cystream just gets the packets and discards it. In the case of out it generates junk packets which are discarded by streamer.


              In your case you've other dependencies i.e. firmware, host app doing something with packet and interface etc etc is different. You should be able to change the VID/PID to 04B4/1003, bind to streamer and send packets. This would give you a good idea of the throughput achieveable.


              By comparing cystream + streamer with your application + firmware we're comparing two unknowns i.e. work the unknowns to isolate and understand where the improvement can be made...





              • 4. Re: Slave FIFO OUT throughput

                Even though what I find is very weird, I am getting closer.




                I have a DLL with a function that takes the data from an apllication, chops it in chunks and sends it to the FX2. When I pass an array of >1MB and I access at least one byte in that array the transfer slows down. If I do not touch the array contents but instead use a dummy array within the DLL the transfer is fast. If I fill up this dummy through another function in chunks <1MB it stays fast. If I only once pass a large array the transfer becomes slower. It doesn't seem to matter how much time there is between the large array access and the transfer. It looks as if some protection algorithm is triggered once I pass a large array that slows everything down. It even scales with how much bigger the array is.




                Does anyone have an idea what it is I am facing here?

                • 5. Re: Slave FIFO OUT throughput

                  I'm not sure about the 1 byte access problem you're facing,


                  It is most probably due to the OS or your variable type (in your DLLs) protection mechanism to avoid data corruption i.e. allow only handle to access the array at a time.


                  With regards to the large array passing, it is most probably due to the size allocation in the host controller driver. By default CyUSB.sys request allocation of 8*endpoint size buffer in the host controller driver for transfers corresponding to a particular endpoint. Say you trigger a large transfer, every time the buffer fills it will be transferred to memory in the OS and then only the host controller can accept more data. This process will slow down the transfer.


                  Try changing this buffer allocation using XferSize variable. 64k is the recommended value.