DMA_BUF_SIZE vs CY_FX_SLFIFO_DMA_BUF_COUNT_P_2_U:
DMA_BUF_SIZE is the buffer size of the DMA Channel is being created.
CY_FX_SLFIFO_DMA_BUF_COUNT_P_2_U is number of such buffers are alloted for the DMA Channel.
In AN75705, we have explained two schemes to receive the data through GPIF as follows:
1. First one is shown in Diagram 10. Here is one producer and one consumer socket for a DMA Channel and associated buffers. When there is no Flow Control Mechanism over the GPIF interface (No Flags are used), there is a possibility for a data loss.
i.e. Say we have 4 buffers to receive the data. Once GPIF starts sampling the data, it will be stored in DMA Buffers. Once one buffer is full the GPIF will wait until another buffer is available to store the data. This is called Buffer Switching. This takes around 1 microsecond. While buffer switching takes place, the another processor connected to GPIF will be writing data to GPIF but there is no buffer available to store. If this is the case, there will be a loss of data.
2. Second one is shown in Diagram 11. Here is two producers and one consumer socket for a DMA Channel and associates buffer for a producer each. Say, if you have set Buffer Size to 16K and Buffer Count to 4, then total buffer size allocated for this channel will be 128 KB (16K x 4x 2 producers). In this mechanism, we have explained that while one buffer of one socket is full then the GPIF state machine is switching to another socket. Note that socket switching takes place in one clock cycle. This helps in receiving the data without latency and loss of data.
Increasing the throughput using Maximum Buffer size:
Let us say we have used a smaller size buffer (say, 8K and count is 8), FX3 is set to receiving some 32K chunks of data from external processor connected over GPIF with a Flow Control Mechanism. For every chunk of 32K, it needs four 8K size buffers. It means 3 buffer switchings are needed to receive one chunk of data. Since buffer switching is a time consuming process, the external process will be waiting for buffer to be available.
So if we configure the DMA with 32K, count is 2 or 4. Then the buffer switching takes place for every 32K chunk of data. This is how the throughput will be increasing.
Thank you for your long answer. I understand you at what you were pointing me in AN75705. I understand the mechanism behind it.
But will the use of two sockets be meaningful for increasing throughput? Or is it just meaningful in the case of application processor without flow control. Or in other words: I have a processor with flow control, will the use of two sockets generate more throughput in this case?
If I want to test the setup firmware (e.g. SlaveFiFoSync) with tow producer sockets with attached FPGA, is it enough to change this code in firmware:
dmaCfg.prodSckId = CY_FX_PRODUCER_PPORT_SOCKET;
dmaCfg.consSckId = CY_FX_CONSUMER_USB_SOCKET;
dmaCfg.cb = CyFxSlFifoPtoUDmaCallback;
<--- Should I just insert a second producer socket in here? Is it enough when I use auto dma mode? Or do I need to use multichannelmode?
The throughput will increase when you go with two producer sockets and one consumer with socket switching mechanism compared to flow control implementation done in AN65974.
If you go with many to one channel implementation done in AN75779, you do not have access to send data over different endpoints.
If you want to send data over different endpoints using Many to one Channel, you need to modify the GPIF state machine. This requires additional control lines from FPGA to switch the sockets.
Yes, you have to insert producer socket while creating the DMA Channel.
I can see that you have done this modification in files mentioned in this thread.
Note that even in this, you need to change the GPIF State machine.
I only want one endpoint with two sockets (P2U transfer) Do I then have to modifiy the GPIF state machine in SlaveFifoSync ? Do I then need additional control lines from fpga to FX3 to switch the sockets?
Cypress provided the UVC example with the in the example integrated GPIF state machine. In the provided state machine there are two states PUSH_DATA_SCK0 and SCK1 which switch alternatively when buffer count gets full.
But in comparison to SlaveFifoSync the uvc example does not need a address line but rather the thread_0 and thread_1 selection is made inside the states.
I need help implementing this. Perhaps we can go through this with email? In SlaveFiFoSync with only one endpoint and SlaveFifoSync with 2 Sockets this Thread I cannot even get the one endpoint project to work.
gpif.png 42.2 K
In case of UVC, the sensor or FPGA will be providing the FV, LV, PCLK and data lines. We have used the LV, FV and internal GPIF counters to switch the sockets among two producer sockets.
If you can provide the control lines in the similar way, you can use the UVC App Note AN75779 GPIF State Machine. Else, please go through with Slave FIFO AN65974 GPIF State machine.
Otherwise, you can even write a custom state machine as per your requirement.
Please let me know the following details:
What is the expected throughput in your case?
What is your application?
What is the interface of FPGA here?
Sorry Srdr for the late answer.
I examined the maximum possible throughput on my usb controller with USBBulkSourceSink firmware. It showed 431 MB/s. So without using FPGA and GPIF I can have 431 MB/s as maximum datarate from fx3 to computer.
GPIF with 32 bit and 100 MHz on FPGA will give me maximum theoretical throughput of 381 MB/s. (Calculation: (32 Bit * 100 MHz=400E6 Bit/s=381.46973 MByte/s).
Using the normal SlaveFifoExample will give me roughly 379 MB/s. This is great and I'm thankful for this. But I wonder is it possible with 2 socket implementation to go as high as theretical GPIF II limit?
Thank you srdr.
In Slavefifo example, you have one IN channel (GPIF producer to USB consumer) and one OUT channel (USB Producer to GPIF consumer).
As far as you operate in one channel (either IN or OUT), the thorughput would be high (379 MB/s). This may not go higher than this. Because there is a delay due to Flags.
If you have mutiple channels (say, two IN channels - 1. GPIF II producer_1 to USB consumer_1; 2. GPIF II producer_2 to USB consumer_2 ). When you switch between the channels, the throughput may reduce. Because switching channels + updating the flags will add delay.
In case of UVC, you have a channel with two GPIF II producer sockets and one USB consumer socket.
Socket switching is taken care in the GPIF II state machine using the counters.Here Flags are not coming in the picture.
Note that the time for buffer switching takes higher time than socket switching. Hence, you may get higher throughput (than 379 MB/s) with UVC kind of impementation.
We have not compared the both the method's throughput. So, the numbers are not known.
If you are happy with 379 MB/s, please go with Slavefifo example method.
Otherwise, you may try with UVC example method.