Data dropped as DMA_Ready flag goes low for a long time - CYUSB3014 FX3

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
lock attach
Attachments are accessible only for community members.
vsarr
Level 1
Level 1
5 replies posted 5 sign-ins First reply posted

Hello there,

I am using a CYUSB3014 as a synchronous slave fifo interface to an Altera FPGA master.

I have modified the 2-bit slave fifo sync firmware (attached) by adding two endpoints and I'm using it to transfer data at an expected rate of about 80 MB/s. The FX3 should handle this without breaking a sweat, but instead I am getting a lot of dropped data.

I am probing the related DMA_Ready flag with Signal Tap in Quartus. I see that it goes low at some point during the transfer and stays so that for such a long time that I can't even seethe end of it within the Signal Tap time range. I have implemented a fifo in the FPGA as a buffer, but it's not enough.

Any help towards solving this issue would be really appreciated. Thanks in advance.

0 Likes
1 Solution

Hello,

As mentioned before, the issue is caused by the slow host. From the debug logs, it is clear that when the issue is seen, all the buffers inside FX3 are full. So, it should be emptied by the host so that it can be filled again by the GPIF II block of FX3. Please check the source code of host application to understand why the host is slow in issuing IN tokens.

Best Regards,
Jayakrishna

View solution in original post

0 Likes
19 Replies
JayakrishnaT_76
Moderator
Moderator
Moderator
First question asked 1000 replies posted 750 replies posted

Hello,

I referred to the project that was shared in your description. I found that you have commented out the code for initializing the debug prints. Please let me know if it is possible to enable the debug prints. This is because we can understand if there are any failures by putting UART prints within the firmware.

Also, please provide the following information so that we can understand the problem better:

1. The GPIF II Designer project that is used for developing this project. This is because I find that you have added 4 endpoints where as the default AN65974 state machine makes use of just 2 endpoints.

2. Please let me know on which endpoint is the issue seen. Also, please let me know when is the issue seen? Are you able to transfer some data through the endpoints or are you not able to transfer any data at all?

3. Please share the traces of DMA_Ready_Flag captured.

4. What is the interface clock between FX3 and the FPGA.

Is the data drop seen because FX3 is asserting the ready flag later than expected and the FPGA is not capable of buffering too many data? Please confirm if my understanding is correct or not

Best Regards,
Jayakrishna
0 Likes
lock attach
Attachments are accessible only for community members.
vsarr
Level 1
Level 1
5 replies posted 5 sign-ins First reply posted

Dear Jayakrishna,

Thanks for your prompt reply. It is indeed possible to enable the debug prints and I just did, but no error is displayed on the UART console. As for the other points you raised:

1. I am attaching the project (project.zip) to this reply.

2. The issue is seen on endpoint 0x83. I am able to transfer some data, but eventually data loss occurs as the FX3 seems unable to keep up.

3. I am attaching a screenshot of my Signal Tap session (signaltap.png) where you can see that flag D goes low and after some time the fifo full signal is asserted.

4. I have a 100-MHz clock generated by one of the PLLs in the FPGA.

Your understanding is correct: by the time the FX3 asserts the flag, the FIFO in the FPGA has been full and dropping data.

0 Likes

Hello,

Thank you for sharing the details requested.

1. Based on these details, I can understand that the interface clock between FX3 and FPGA is 100MHz. Also, the data bus width is 32 bits. This means that the data rate should be 400MBps and not 80MBps. But, in the original description, I find that you have mentioned data rate as 80MBps. Please let me know if I went wrong somewhere.

2. Is the issue seen only on the endpoint 0x83 or is the same symptom seen on all the endpoints? 

3. I referred to the GPIF II project that was shared in your previous response. I saw that you are just using READY flag of all the threads to initiate and terminate the data transfers. The ready flag takes some time to indicate the buffer full or buffer empty condition, so it is not recommended to use ready flag for terminating the data transfer. For starting the transfer, you can make use of READY flag. But, for terminating the transfer, you need to make use of Partial flag. Please refer to AN65974 to understand more about Partial flag. The link to AN65974 is given below:

https://www.cypress.com/file/136056/download

As you are using 4 threads, it is not possible to use 4 dedicated ready and 4 dedicated watermark flags. So, you can make use of current thread flags (ready and watermark). But, when you use current thread flags, there will be additional flag latencies. This is shown in Section 8.2 and Table 4 of AN65974.

So, kindly make use of ready and partial flags to perform the data transfer. This would require changes to be made on FPGA side also.

4. In the traces of Flag D, I can find that there is a small glitch before going LOW. I think this indicates that the previous buffer was filled completely. Please confirm if my understanding is correct or not.

Best Regards,
Jayakrishna
0 Likes
lock attach
Attachments are accessible only for community members.

1. Sorry about the misunderstanding. I actually meant that I am expecting a data rate of approximately 80 MBps from an ADC connected to the FPGA, and this data stream is supposed to be sent to the computer via the FX3.

2. The other endpoints are configured differently. I can try, but I will need some time to reconfigure the FX3 firmware, FPGA firmware, and acquisition software.

3. I agree that using the Ready flag is not ideal for my purpose. However, I implemented workarounds in the FPGA firmware so that no data is lost due to the latency. I also tried using the dedicated Partial flag to terminate the transfer as you suggested, but I'm back to square one as the pause is just as long. As you can see in the watermark.png screenshot where the Partial flag (active low) is FLAGC, both flags exhibit the same behaviour, give or take a handful of clock cycles. I still need to wait for the Ready flag to be asserted in order to be able to resume sending data.

3. Your understanding is correct. The previous buffer was filled completely as I'm sending packets that exceed the buffer size but I implemented the FPGA firmware so that (and verified that) no data is lost due to that kind of glitch. I also tried splitting the data into packets smaller than the buffer size, but eventually I get the same issue.

0 Likes

Hello,

Thank you for the updates.

So, as per my understanding, the data rate between FX3 and the FPGA is 400MBps. Please correct me if my understanding is not correct.

Also, I saw that the time delay for the ready flag to go HIGH again in between transfers as shown in signaltap.png that was shared in your previous response is less. So, does the ready flag takes delay to go HIGH only sometimes? If yes, after how many buffers are you seeing this delay.

Best Regards,
Jayakrishna
0 Likes

Your understanding on the data rate is correct.

I have yet to identify a precise pattern for when the ready flag goes high but it seems to be correlated with the packet size. So far, I have observed that I can transfer roughly between 4 to 130 buffers before noticing data loss. Shorter packets result in data being lost earlier than when larger packets are transferred.

0 Likes

Hello,

Thank you for the confirmation.

Please let me know what exactly was meant by packet size in your previous response? Also, please let me know the size of short and large packets. 

In addition to this, by the following statement

"So far, I have observed that I can transfer roughly between 4 to 130 buffers before noticing data loss."

did you mean to say that after these many buffers, the ready flag takes lot of time to go HIGH again? Please confirm.

Is it possible to share the Wireshark traces when the test is done?

Best Regards,
Jayakrishna
0 Likes
lock attach
Attachments are accessible only for community members.

I'll try to explain what I mean by packet size. Basically, I need to transfer data which is composed of blocks of 4104 bytes. The short packet option I am referring to consists in sending the packet with the last word of this block. I also tried combining several blocks into larger packets to reduce the number of read requests on the computer side. I also tried streaming the data continuously (as in the AN65974 example) until the user stops the acquisition. When I send each block of data as an individual packet, I get the pause in the Ready flag earlier than when I send larger packets or stream the data continuously.

As for the statement you quoted, I confirm that after those many buffers (approx. 4 when sending each block as a packet and 130 when streaming the data) the Ready flag takes a lot of time to be asserted again.

My familiarity with Wireshark is very limited but a colleague of mine acquired some data, which I'm gonna attach. He also sent me a screenshot  in which he highlighted the moment the FX3 becomes unresponsive and I'm sharing that as well.

0 Likes

Hello,

By the following statement in your previous response, 

"I confirm that after those many buffers (approx. 4 when sending each block as a packet and 130 when streaming the data) the Ready flag takes a lot of time to be asserted again."

and based on the fact that the Ready flag is asserted after a delay, it looks like the host application is slow in issuing the IN tokens to clear the DMA buffers that are filled by GPIF II block of FX3. From the firmware, I saw that you have allocated 2 DMA buffers for this channel. Although each DMA buffer can hold a maximum of 16KB of data, it will get wrapped up when you send a short packet. This could be the reason for the flag assertion to get delayed soon when you are sending short packets. 

Please try the following and let me know the results:

1. Increase the DMA buffer count for this particular channel and let me know if you are seeing any improvements with respect to the number of buffers after which the pause is seen.

2. Change this particular channel type from CY_U3P_DMA_TYPE_AUTO to CY_U3P_DMA_TYPE_AUTO_SIGNAL. After this, add notifications for PROD and CONS events and register a callback function for this channel. Increment a global count variable when you receive a PROD event and decrement the same when you receive a CONS event. Print the value of this global count variable inside the infinite for loop. After this, share the value of this global count variable when the pause is seen. 

Best Regards,
Jayakrishna
0 Likes

Hello Jayakrishna,

Thanks for your input!

1. I have now 10 buffers and the number of buffers before the pause has increased by about 40%. 

2. I have enabled the event notifications as per your suggestion. Interestingly, I get the following output:
CyU3PDmaChannelCommitBuffer failed, Error code = 71
Event tracker: 0.
CyU3PDmaChannelCommitBuffer failed, Error code = 71
Event tracker: 1.
CyU3PDmaChannelCommitBuffer failed, Error code = 71
Event tracker: 2.
CyU3PDmaChannelCommitBuffer failed, Error code = 71
Event tracker: 3.

I am not sure what error 71 is but we might be onto something here.

0 Likes

Hello,

The channel type CY_U3P_DMA_TYPE_AUTO_SIGNAL is similar to AUTO channel but you can get the PROD and CONS notifications by registering a callback function. There is no need to commit the data by using the API CyU3PDmaChannelCommitBuffer(). As I mentioned in my previous test, inside the PROD_EVENT, just increment a global counter variable. When a CONS_EVENT is received, decrement the global counter variable. Then print the value of this global counter variable in the infinite for loop. Please do not use any other APIs inside the DMA callback function.

Best Regards,
Jayakrishna
0 Likes
lock attach
Attachments are accessible only for community members.

Sorry, I'm afraid this is a bit outside my expertise (I'm more on the FPGA side of this project) and the guy who works on this is away until the end of the month.

I am attaching the source I'm working with. Would you be able to have a look and insert the code we need to debug the issue? Thanks in advance!

0 Likes
lock attach
Attachments are accessible only for community members.

Hello,

I have not increased the DMA buffer count for this particular channel as we already saw that increasing the DMA buffers will reduce the frequency of the issue. Please find the source code having the global variable added inside the DMA callback function. Please build the firmware with this source file and share the UART debug logs when the issue is seen.

Best Regards,
Jayakrishna
0 Likes

Much appreciated, thanks! The debug log says "Data tracker: buffers left to be transmitted is 2" when the issue is seen, whereas it said "Data tracker: buffers left to be transmitted is 0" before that.

0 Likes

Hello,

As mentioned before, the issue is caused by the slow host. From the debug logs, it is clear that when the issue is seen, all the buffers inside FX3 are full. So, it should be emptied by the host so that it can be filled again by the GPIF II block of FX3. Please check the source code of host application to understand why the host is slow in issuing IN tokens.

Best Regards,
Jayakrishna
0 Likes

It makes sense. As a starting point, is there any specific API call I should be looking for that sends the IN tokens?

0 Likes

Hello,

Please let me know which library is used for developing the host application? Is the device binding to cyusb3.sys driver?

As a test, you can try to use the streamer application that comes along with FX3 SDK. You can read the data from the FX3 device using this application and then check if the flag assertion is getting delayed. 

Best Regards,
Jayakrishna
0 Likes

The current FPGA firmware does not interface directly with the Streamer application so I'll need some time to make that test.

As for the host application, it uses libusb-1.0-0, runs under Linux and doesn't use the cyusb3.sys driver.

0 Likes

Hello,

Please refer to the following link which has more information on using asynchronous APIs for the transfers:

https://vovkos.github.io/doxyrest/samples/libusb/group_libusb_asyncio.html#details-group-libusb-asyn...

Best Regards,
Jayakrishna
0 Likes