CYUSB3013 low control read performance with FX3 SDK library versions 1.3.2 and higher

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
AlSh_4533926
Level 1
Level 1

Newest versions of FX3 SDK firmware library (1.3.2 and higher) have problem with CYUSB3013: abnormally slow processing of standard EP0 control read requests (reproduced on "cyfxbulklpautoenum" example from FX3 SDK)

CYUSB3013 with library 1.3.1 can do 8000 EP0 control read operations per second via USB2, while upgrading to 1.3.2 (and higher) reduces performance to 100 operations per second.

The problem is specific to CYUSB3013, and it appears after CyU3PDmaChannelSetXfer() function called.

The problem does not happen with CYUSB3014. Also it does not happen with EP0 control write operations.

Badly need bug fix for the FX3 SDK library 1.3.4 (and higher) for CYUSB3013!

Downgrading to old version of the library is not an option.

More details on reproducing the problem:

1) Use cyfxbulklpautoenum SDK embedded code example from https://www.cypress.com/documentation/code-examples/usb-superspeed-code-examples

2) Make sure embedded code invokes CyU3PDmaChannelSetXfer() function

3) Use external computer to issue “Control read” EP0 operations via USB2 (high speed) with the following parameters:

GET_CONFIGURATION

bmRequestType = 0x80

bRequest = 8

Value = 0

Index = 0

Length = 1

4) Measure performance (operations per second)

0 Likes
1 Solution

Hello Alex,

As described in FX3_SDK_TroubleShooting_Guide provided with the FX3 SDK, Section 2.3, Part IV.

The low performance of the CyU3PUsbSendEP0Data() is because, the other IN (BULK,ISO, INTERRUPT) endpoints need to be suspended so that the the data over the control endpoint doesn't get corrupted due to premature data fetching from the DMA channel and was included from SDK version 1.3.2 onwards.

As a workaround you could implement the following snippet of code instead of CyU3PUsbSendEP0Data():

               extern CyU3PDmaChannel glUibChHandle;                    /* In channel handle for ep0 */

                    extern CyU3PReturnStatus_t                                       /* declaration of DmaChannelSendData */

                    CyU3PDmaChannelSendData (

                                 CyU3PDmaChannel *handle,

                                 uint8_t         *buffer,

                                 uint16_t         count);

In the vendor command that you are using, replace CyU3PUsbSendEP0Data() with the below two lines:

               CyU3PDmaChannelSendData (&glUibChHandle, glEp0Buffer, wLength);               /* use instead of CyU3PUsbSendEP0Data*/

               CyU3PUsbAckSetup ();                                                                                              /*important to ack the request from host*/

NOTE: The above workaround is a way to match the control endpoint performance to that of SDK 1.3.1 but it should be taken care in the firmware that there is no BULK-IN transaction while EP0-IN transaction is going on as it will lead to the control read data getting corrupted.

Also, the above workaround is only provided to you and will not be added to the future releases of SDK as there are the above mentioned problems with it. So, in future releases, you will have to take care of the workaround in the application by yourself.

Regards,
Yashwant

View solution in original post

0 Likes
10 Replies
YashwantK_46
Moderator
Moderator
Moderator
100 solutions authored 50 solutions authored 50 likes received

Hi,

Can you please tell if you are performing data transfer over bulk and control endpoints parallely?

Also, is the host application requesting the data continuously or is it requesting only after the host waits to recieve data from the previous request?

Is it possible for you to share the host application and the firmware that you are currently using?

It would help us in understanding the behaviour better and enable us to get to the probable cause.


Regards,
Yashwant

0 Likes

Hi Yashwant,

Here are more details:

1) Bulk data transfer was not used by host computer in these tests. Just GET_CONFIGURATION Control operation.

2) Host application waits for completion of Control transaction, then - initiates next Control transaction. Waiting is not a problem with old version of the library: achieving 8000 control transactions per second (likely, one transaction per 125 usec USB cycle). But with new versions of the library overall performance drops to 100 transactions per second.

3) Now preparing very simple program for the host computer and simple FX3 code to demonstrate the problem as isolated as possible. It will take some time. Is it OK that host computer code will be for LINUX? Using LINUX allows to isolate program from Windows driver features.

Thank you,

  Alex

0 Likes

Hi Alex,

I am trying to reproduce and test the issue on my side.


If you are done with the host program, can you please share it along with the frimware so that i can better test the issue?


Regards,
Yashwant

0 Likes

Hi Yashwant, please find below: steps to reproduce the problem on LINUX

Thank you,

   Alex

1       FX3 firmware

Use the following firmware image provided by Cypress in EZ-USB FX3 SDK 1.3.4 for Linux

  1. Download FX3_SDK_1.3.4_Linux.tar.gz from  https://www.cypress.com/file/424271/download
  2. Extract cyusb_linux_1.0.5.tar.gz from FX3_SDK_1.3.4_Linux.tar.gz
  3. Use fx3_images/cyfxbulklpautoenum.img

2       Host computer software

  1. LINUX computer is used as host.
  2. Using: libusb-1.0 and sudo
  3. Download fx3bug_v2.tgz via the following link:

https://www.dropbox.com/sh/wgm1wl9wemde4le/AADdu8f_n9Vb_v1iZsAuh7CEa?dl=0

4. Extract the following files from fx3bug_v2.tgz:

File

Description

  1. controltest.c

Test program to demonstrate the problem

  1. cyfxbulklpautoenum.img

Firmware image provided by Cypress in EZ-USB FX3 SDK 1.3.4 (downloaded as described above)

loadfx3.c

Simple software tool to load firmware image into FX3 device (in our case - CYUSB3013)

Makefile

Build controltest and loadfx3 from sources

run_tests.sh

BASH script to build and run the test

5. Run the script: “run_tests.sh”

3       Notes on test script

Test script “run_tests.sh” does the following:

  1. Use “make” to build “controltest” and “loadfx3”
  2. Check functionality of “bare” CYUSB3013 (with no firmware loaded, just after power-cycle)

a) Measure performance for control write: SetConfiguration

b) Measure performance for control read: GetConfiguration

3. Load cyfxbulklpautoenum.img into FX3 device being tested (in our case - CYUSB3013)

4. Check functionality of “loaded” CYUSB3013

4       Test results

Test script was run with CYUSB3013 FX3 device, connected via USB2, with the following results:

  • “bare” FX3 device (with no firmware loaded) makes 23000 control read/write operations per second
  • FX3 device with SDK 1.3.4 firmware loaded makes less than 100 control read operations per second

$ ./run_tests.sh

Running tests. Cycle the board power, please. Press <ENTER>

cc   -lusb-1.0  loadfx3.c -o loadfx3

cc   -lusb-1.0  controltest.c   -o controltest

Bare CPU, control write: SetConfiguration

10000 operations in 0.416679 seconds: 23999.3 operations per second

Bare CPU, control read: GetConfiguration

10000 operations in 0.424478 seconds: 23558.3 operations per second

Load FX3 CPU with Cypress basic_examples/cyfxbulklpautoenum

Loaded CPU, control write: SetConfiguration

10000 operations in 0.48143 seconds: 20771.5 operations per second

Loaded CPU, control read: GetConfiguration. Wait.

10000 operations in 113.544 seconds: 88.0715 operations per second

0 Likes

Hi Yashwant,

Further investigation revealed that the problem is not specific to CYUSB3013 chip.

There is exactly the same problem (low performance) with original FX3 CYPRESS development kit (CYUSB3014 chip) and with "cyfxbulklpautoenum" example taken from FX3 SDK revision 1.3.4.

What was obscuring the situation: EP0 control read performance depends upon previous operations done with FX3.

If using simple straightforward sequence of actions (power cycle, load cyfxbulklpautoenum, initate GET_CONFIGURATION) then the problem manifests itself consistently: 90 control read operations per second only, instead of more than 8000 (up to 25000) op/sec.

Updated LINUX script and updated test report document uploaded to https://www.dropbox.com/sh/wgm1wl9wemde4le/AADdu8f_n9Vb_v1iZsAuh7CEa?dl=0

Regards,

  Alex

0 Likes

Hi Alex,


Please refer to the following thread: Re: CYUSB3014 - How to speed up operate control endpoint IN for USB highspeed


Also, please refer to FX3_SDK_TroubleShooting_Guide provided with the FX3 SDK, Section 2.3, Part IV.

The low performance of the CyU3PUsbSendEP0Data() is because, the other IN (BULK,ISO, INTERRUPT) endpoints need to be suspended so that the the data over the control endpoint doesn't get corrupted due to premature datat fetching from the DMA channel.

You can go through the source of the CyU3PUsbSendEP0Data() and explore it to find the process of suspending the IN endpoints and then resuming them after the EP0-IN is finished in FX3 SDK version 1.3.4.

But in SDK version 1.3.1, you won't find the same issue as this fix was included only from SDK version 1.3.2 and higher.

Regards,

Yashwant

0 Likes

Hi Yashwant,

What you are mentioning (bulk-control interference problem) makes practical usage of CYUSB3014/CYUSB3013 almost impossible.

When this silicon bug is scheduled to be fixed?

Thank you,

  Alex

0 Likes

Hello Alex,

As described in FX3_SDK_TroubleShooting_Guide provided with the FX3 SDK, Section 2.3, Part IV.

The low performance of the CyU3PUsbSendEP0Data() is because, the other IN (BULK,ISO, INTERRUPT) endpoints need to be suspended so that the the data over the control endpoint doesn't get corrupted due to premature data fetching from the DMA channel and was included from SDK version 1.3.2 onwards.

As a workaround you could implement the following snippet of code instead of CyU3PUsbSendEP0Data():

               extern CyU3PDmaChannel glUibChHandle;                    /* In channel handle for ep0 */

                    extern CyU3PReturnStatus_t                                       /* declaration of DmaChannelSendData */

                    CyU3PDmaChannelSendData (

                                 CyU3PDmaChannel *handle,

                                 uint8_t         *buffer,

                                 uint16_t         count);

In the vendor command that you are using, replace CyU3PUsbSendEP0Data() with the below two lines:

               CyU3PDmaChannelSendData (&glUibChHandle, glEp0Buffer, wLength);               /* use instead of CyU3PUsbSendEP0Data*/

               CyU3PUsbAckSetup ();                                                                                              /*important to ack the request from host*/

NOTE: The above workaround is a way to match the control endpoint performance to that of SDK 1.3.1 but it should be taken care in the firmware that there is no BULK-IN transaction while EP0-IN transaction is going on as it will lead to the control read data getting corrupted.

Also, the above workaround is only provided to you and will not be added to the future releases of SDK as there are the above mentioned problems with it. So, in future releases, you will have to take care of the workaround in the application by yourself.

Regards,
Yashwant

0 Likes

Hi Yashwant,

Thank you. Yes, I understand that because of CYUSB301X silicon bug it is practically impossible to use USB 2.0 control transfer simultaneously with bulk transfer.

So, looking for workarounds:

1) Is it possible for two IN bulk endpoints to work in parallel with USB 2.0?

2) Is DMA auto channel stopped during other DMA channel callback?

3) Will one DMA callback be interrupted by another ?

Thank you,

  Alex

0 Likes

Hi Alex,

Can you please specify what you meant by two BULK-IN endpoints working in parallel?
How do you plan on achieving the parallel working?

2.) DMA auto channel is not stopped during the DMA callback of another channel. DMA AUTO channel can work independently.

3.) There is no concept of prioritizing callbacks in FX3. It's a first come first serve basis. So, the current callback with be serviced first and then only the second callback will be serviced.

There will no interruption in handling of a DMA callback by another DMA callback.

Regards,

Yashwant

0 Likes