EZ-USB FX3 Read FIFO latency

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
JaLo_3720291
Level 2
Level 2
First like given

Hi Cypress,

I am working on the EZ-USB FX3 development. The FX3 works as a bridge between the PC and FPGA.

                                   GPIF

     PC > FX3 (slave) >>>>>> FPGA (master)

The GPIF interface used between FX3 and FPGA is based on AN87216. The problem that I am facing is during the SLAVE_RD. According to the application note, the valid data always appears after 2 cycles while the SLRD is low. From the capture below, CLK[221] and SLRD = 0, the valid data should be appeared at CLK[223]. However, the valid data [001F] appeared at CLK[224].

problem1-0_slave_read.png

This above case always happens when the SLRD goes from low to high. Any idea?

** I forgot to mention that the watermark is 2.

     CyU3PGpifSocketConfigure(1, CY_U3P_PIB_SOCKET_1, 2, CyFalse, 1);

And, I read the application note AN65974 section 8.3. Did it mean the #SLRD should keep asserted for one more clock cycle (2 x (32/16) - 3) for the data 001f?

an65974-section-8.3.jpg

Thanks!

Jason

0 Likes
1 Solution
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

The waveform indicates that the data at clk 220 and 221 did not change. For your question, yes, SLRD should be kept asserted for one cycle after the partial flag is sampled by the FPGA. Currently are you doing the same? If yes, can you please share the waveform indicating the same.

Regards,

Hemanth

Hemanth

View solution in original post

0 Likes
22 Replies
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

The waveform indicates that the data at clk 220 and 221 did not change. For your question, yes, SLRD should be kept asserted for one cycle after the partial flag is sampled by the FPGA. Currently are you doing the same? If yes, can you please share the waveform indicating the same.

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

I fixed the above issue by delaying 2 clock cycles on signal SLOE and SLCS after SLRD goes high.

However, I am facing another issue:

     >>>     FX3_1     >>>

PC                                  FPGA

     <<<     FX3_2     <<<

COM3 is the log from FX3_1.

COM14 is the log from FX3_2.

Case 1a: I send 6 bytes to FX3_1, I can get first 4 bytes.

Case 2a: I send 3 bytes to FX3_1, I can get those 3 bytes + 1 zero padding.

Case 3a: I send <=2 bytes to FX3_1, I cannot get any data. The USB control center return error code: -997.

In addition, I try also:

Case 2b: I send 7 bytes to FX3_1, I can get those 7 bytes + 1 zero padding.

Case 3b: I send 5 bytes to FX3_1, I can get first 4 bytes.

Q1: It looks like 4 bytes alignment issue. Any limitation to the short packet transfer??

Q2: Does it relate to the DMA alignment?

Q3: Any way to print out all the data received from slave GPIF interface?

pastedImage_0.png

Thanks,

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

- We cannot print the data that is sampled over GPIF.

- The issue is seen when non-multiple of 4 bytes is sent before sending a PKTEND signal to the slave. In the slave state machine, when !SLWR&!PKTEND is true, INTR_CPU is called. INTR_CPU generates a callback in the firmware in which CyU3PDmaChannelSetWrapUp() is called.

So, making use of CyU3PDmaChannelSetWrapUp() in conjunction with sending non-multiple of 4 bytes is causing the issue.

Instead what I have tested is:

- Instead of using Slave fifo state machine from AN87216, use AN65974 state machine where COMMIT action is used when !SLWR&!PKTEND is true.

- But in that case, we cannot directly use the master state machine as in AN87216. The reason being: In master, when you commit a short packet, let's say 4 bytes, the waveform will look like below(if the GPIF bus width is 16 bit wide):

4bytes_1.PNG

According to the above picture, two 16 bit words will be sent out and then at the next edge PKTEND is asserted (along with SLWR and SLCS).

If this waveform is created at the AN65974 slave interface, it would sample 3 16 bit words and commit the buffer to USB.

So, a slight modification has to be done on the Master side to assert PKTEND exactly at the end of 2 cycles/after 4 bytes.

In your final product if you are using FPGA, then this can be taken care easily. Please let me know if you are using FX3 as master in your final application.

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

1. We will use the following setup for the prototype state:

                slave

     >>>     FX3_1     >>>

PC                                  FPGA master

     <<<     FX3_2     <<<

                slave

However, for development phase, I am currently using the following setup:

       >>> FX3_1 (Master)

PC          [two FX3 are back-to-back connected]

       <<< FX3_2 (Slave)

The master FX3 receives the data from PC then write the data to the slave by GPIF.

The slave FX3 receives the data from master via GPIF then commit to the USB (PC).

The non-multiple of 4 bytes issue is found while testing by USB control center. Let me show you the state machine of both master and slave side:

Master

master_state_machine.png

Slave

slave_state_machine.png

Besides the previous issue, I have other questions:

1. May I ask why the latency is 3 clock cycles instead of 2? I read the timing diagram and the latency should be 2.

csn = SLCS

oen = SLOE

rdn = SLRD

Watermark = 3

Data bus width = 16-bit

read_latency_issue.png

2. The FX3 hangs up when sending more than 1024 bytes by using USB Control Center.

*COM3 is master side, COM14 is slave side.

*Master prints the GPIF state every seconds.

1st transfer is 1024 bytes (max packet size) and the master still can print out the state. However, I send 1200 bytes in USB control center. The master hanged but it can print out 2 split buffers (1024 + 176) in DMA event callback. Do you have any idea?

pastedImage_6.png

Thanks,

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

- I understand that in your final application, FPGA is the master and two FX3 will act as slaves. In that case, there should not be any alignment problem if you use COMMIT action in SHORT_PKT state in your slave state machine. And while asserting PKTEND signal FPGA should take care that it is asserted along with the last word sent by the FPGA.

Please let me know if you have any question regarding this.

- Regarding the 3 cycle latency: You have measured from the point 100(where SLRD goes low) to 103. But can you also see this timing diagram with respect to Clock? The time difference between the first rising edge where SLRD is asserted to the first rising edge where the data driven is valid should be two clock cycles.

- You are not supposed to perform UART Debug prints in DMA callback. This may be the issue that you are facing. Please remove and let me know.

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

1. Do you mean I should update the state machine as follow?

slave_state_machine.png

2. In the picture each integer point represents a rising edge of the clock (each segment represents a clock cycle).

Therefore, it shows that the latency, from the first rising edge when #SLRD is asserted to the first rising edge when the data is valid, is 3 instead of 2.

3. The hang up issue disappeared after I changed the DMA mode from manual to auto.

Thanks!

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

- SHORT_PKT state should have IN_DATA along with COMMIT. But along with that, you should also take care that the PKTEND is asserted by the master(FPGA) along with the last word being sent(not one clock cycle after).

- The hang up may not be related to Auto/Manual mode.

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

I found that the PCLK is uneven. Any idea?

pastedImage_0.png

Thanks,

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

In your firmware, is clockConfig.setSysClk400 is set to true?

Regards,

Hemanth

Hemanth
0 Likes

Yes. Below is the clock setup segment.

 

    CyU3PIoMatrixConfig_t io_cfg;

    CyU3PReturnStatus_t status = CY_U3P_SUCCESS;

    CyU3PSysClockConfig_t clkCfg = {

    CyTrue,

    2, 2, 2,

    CyFalse,

    CY_U3P_SYS_CLK

    };

    /* Initialize the device */

    status = CyU3PDeviceInit (&clkCfg);

    if (status != CY_U3P_SUCCESS)

    {

        goto handle_fatal_error;

    }

    // PIB initialize

    CyU3PPibClock_t pibClock;

    CyU3PReturnStatus_t apiRetStatus = CY_U3P_SUCCESS;

    /* Initialize the p-port block. */

    pibClock.clkDiv = 4;

    pibClock.clkSrc = CY_U3P_SYS_CLK;

    pibClock.isHalfDiv = CyFalse;

    /* Disable DLL for sync GPIF */

    pibClock.isDllEnable = CyFalse;

    apiRetStatus = CyU3PPibInit(CyTrue, &pibClock);

    if (apiRetStatus != CY_U3P_SUCCESS)

    {

        CyU3PDebugPrint (4, "P-port Initialization failed, Error Code = %d\r\n",apiRetStatus);

        CyFxAppErrorHandler(apiRetStatus);

    }

    CyFxApplnSetPibDllParameters (CyTrue, 1, 0, 11);

    :

    :

}

static void

CyFxApplnSetPibDllParameters (

        CyBool_t isEnable,              /* Whether to enable the DLL. */

        uint8_t  corePhase,             /* Delay selection for the PIB core clock.

                                           Takes a value between 0 and 15, and applies a delay of

                                           (corePhase * 22.5 degrees) from the master clock. */

        uint8_t  syncPhase,             /* Delay selection for the data synchronizer clock used in async modes. */

        uint8_t  opPhase                /* Delay selection for the output clock driven by FX3. */

        )

{

    /* Disable DLL */

    CY_FX3_PIB_DLL_CTRL_REG &= ~(CY_FX3_PIB_DLL_ENABLE);

    CyU3PBusyWait (1);

    if (!isEnable)

        return;

    /* Configure and enable the DLL. */

    CY_FX3_PIB_DLL_CTRL_REG = (

            ((corePhase & 0x0F) << CY_FX3_PIB_DLL_CORE_PHASE_POS) |

            ((syncPhase & 0x0F) << CY_FX3_PIB_DLL_SYNC_PHASE_POS) |

            ((opPhase & 0x0F)   << CY_FX3_PIB_DLL_OP_PHASE_POS) |

            CY_FX3_PIB_DLL_HIGH_FREQ |

            CY_FX3_PIB_DLL_ENABLE

            );

    /* Reset the DLL */

    CY_FX3_PIB_DLL_CTRL_REG &= ~(CY_FX3_PIB_DLL_RESET_N);

    CyU3PBusyWait (1);

    /* Clear Reset */

    CY_FX3_PIB_DLL_CTRL_REG |= CY_FX3_PIB_DLL_RESET_N;

    CyU3PBusyWait (1);

    /* Wait for DLL to lock */

    while (!(CY_FX3_PIB_DLL_CTRL_REG & CY_FX3_PIB_DLL_LOCK_STAT));

}

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

I have not seen such clock in my setup.

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

1. The hang up issue is resolved after removed those debug print out.

2. PCLK uneven issue is resolved because the sampling rate of the logic analyzer.

3. IN_DATA is added back to the SHORT_PKT state along with COMMIT and the master asserts the PKTEND along with the last word being sent. We have the following observation:

DMA mode = AUTO

DMA size = 1024,

DMA count = 16,

Watermark = 3 (i.e. 3 x (32bit/2) - 4 = 2 clock cycles),

USB max packet size = 1024

a. Master sends short packet (< 1024 bytes) to slave, then bulk-in the short packet in USB control center, we can get the correct data and data length.

b. However, if master sends data with size larger than the DMA size (i.e. 1024KB + 128 bytes) to the slave, the 128 bytes are lost.

1.png

The above capture shows the start of the 1024B transfer.

* aful - DMA watermark flag

  csn - SLCS

  wrn - SLWR

  pen - PKTEND

In master side, DMA watermark flag is used as a trigger for beginning a transfer. And, the GPIF state machine is started after DMA channel created.

2.png

The above capture shows the end of the 1024B transfer.

3.png

The above capture shows the last 128B transfer.

4.png

The above capture shows the transition from 1st 1024B transfer to last 128B transfer.

pastedImage_7.png

The above capture is the state machine of the slave side.

Do you have any idea? My understanding is that once the master sends 1024K data (same as the DMA size) to the slave, then the FX3 will commit the data automatically to the USB and switch to the next DMA buffer after DMA_RDY flag is de-asserted. Therefore, no PKTEND asserted for this case. After the DMA_RDY flag de-asserted, the master can send the remaining 128B (i.e. short packet) to the slave with PKTEND asserted at the last word.

By the way, does short packet means the size of the packet is less than the DMA size of one DMA slot?

Thanks,

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

- Yes, if the DMA buffer size of the slave is 1024 bytes, then, once slave receives 1024 bytes, the buffer gets committed to USB and no PKTEND is required in that case.

When the number of bytes sent to the slave over GPIF is less than the DMA buffer size then it is a short packet and PKTEND assertion is required in that case.

- To debug current scenario, make the channel as MANUAL and see if you are getting a callback when COMMIT is executed in the state machine. If it is received check what is the count.

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

I have tested 2 cases:

a. 1024 + 128 bytes

The result always have last 128 bytes data missed.

b. 2048 bytes

missing_data_console_log.png

For test case b, I did twice, the above print out is done in the DMA callback (PROD event) and the results are different:

Test b1 (blue rectangle): It indicates that the FX3 received 2KB data but error code 997 is shown when bulk-in in USB control center.

Test b2 (green): only 1026 bytes are received by FX3 and same number of bytes are received in USB control center. For this case, I have a screen capture and a strange behavior of the DMA partial flag. The DMA partial flag did not assert even reach the watermark level (watermark = 3) in the last 1KB transmission.

missing_data_logic_analyzer.png

^ overall transmission waveform

missing_data_logic_analyzer_3.png

^ correct DMA partial flag waveform (1st 1KB transmission)

missing_data_logic_analyzer_2.png

^ strange behavior of DMA partial flag waveform (2nd 1KB transmission)

* Also, the FPGA side always assert the PKTEND in the last transmission (This is wrong implementation but I think it's not related to this data missing issue)

Thanks,

Jason

0 Likes

Hi Hemanth,

I was testing on the data thought-put. The test environment is in Windows 10. The test procedures are:

1. The PC keeps bulk out data to the FX3_1 (Slave)

2. The FPGA* (Master) keeps reading data from the FX3_1.

3. The FPGA will then write the data to another FX3_2 (Slave).

4. The PC will continuously bulk in data from FX3_2.

* For step 2, the FPGA actually uses logical 'AND' the DMA ready flag (Empty) and DMA watermark flag (Almost Empty). Therefore, the FPGA can use DMA watermark flag as trigger to terminate the data read. Also, the DMA ready flag can be used as trigger to begin the data read.

* For step 3, the FPGA will check if the DMA ready flag (Full) to see if it can start data write operation. And, check the DMA watermark flag to terminate the current write operation.

I found the following issues:

1. PC makes 16 URB (with timeout=0) to send data to FX3_1. Somehow, the FPGA stopped reading the FX3_1 and PC also no bulk out randomly. Then I check the DMA ready flag and DMA watermark flag of the FX3_1. The DMA ready flag stayed 'HIGH' (active low) and the DMA watermark flag stayed 'LOW' (active low). As, the FPGA logically ANDed both signals. Therefore, the FPGA read stopped.

My question is what makes this issue happen? Does it relate to the watermark flag? I suspect that the DMA still has 1 data word left in the DMA buffer cause the flags not correct. And somehow cause the FPGA think that it read all data out.

Both GPIF interface runs 50MHz. DMA buffer counts is 16. DMA buffer size is 1KB. DMA watermark is 3.

FX3_1:

watermark: 3

FX3_2:

watermark: 3

Thanks,

Jason

 

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

if you make logical AND of the DMA Ready(active low) and watermark(active low), then FPGA will stop reading after the watermark is low. You also have mentioned the same in your comment.

But after the watermark flag is asserted, as you already know, there has to be few reads from FPGA(which depends on your watermark setting) that needs to be made before the buffer is emptied. How is this being taken care?

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

After logical AND of both flags, the FPGA will read 3 more cycles after the ANDed signal asserted and ended the data read by de-asserting the SLRD (de-assert the SLCS and SLOE after 2 clocks). Normally, I can see the DMA ready asserted after the SLCS and SLOE deassert. Therefore, the FPGA will stop reading the FX3. However, when the abnormal case happens, the DMA ready held HIGH (deassert) after SLCS and SLOE de-asserted.

1. Overview of data read

1.png

2. Start of data read

start.png

3. End of data read

end.png

Thanks,

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

- In the Figure 3, related to end of data, I see that neither the DMA RDY Flag nor the DMA water mark flag is asserted. Can you comment on this?

- Regarding the issue you have told, does it happen sometimes? (What is the frequency?) Normally, instead of 16 URBs, of you just send one DMA buffer worth of data from PC to FX3_1, then can FPGA read that buffer correctly?

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

- In the Figure 3, related to end of data, I see that neither the DMA RDY Flag nor the DMA water mark flag is asserted. Can you comment on this?

You can refer to figure 1 for the assert of DMA ready and watermark flags.

- Regarding the issue you have told, does it happen sometimes? (What is the frequency?)

It always happens under the following test environment / condition:

2 python programs are used for this test. Both python programs use libusb1 · PyPI, and I use "zadig-2.4-1.exe" to replace the cypress driver with WinUSB (v6.1.7600.16385).

Program A:

Use 16 URBs to transfer 100MB data with timeout = 0

Program B:

Use 16 URBs to bulk in data continuously with timeout = 0

The issue must be happened but not always stop at particular transfer size. I mean it could be stopped at the very beginning of the transfer (like few ten KB), in the middle (a few ten MB) or almost complete (like 9x MB). Because of this random stop, I cannot capture by the logic analyzer. I just know the final pin state and the DMA ready is HIGH and DMA watermark is LOW.

- Normally, instead of 16 URBs, of you just send one DMA buffer worth of data from PC to FX3_1, then can FPGA read that buffer correctly?

If I set both # of URB to 1 on both program, I cannot reproduce / see this issue.

Another findings are:

1. Keep # of URBs = 16 on both python program and set timeout to a value but not 0. then I cannot see the issue.

2. Or, I keep # of URBs = 16 and timeout = 0, but stop the FPGA loopback mode, the FPGA just keep receiving data from FX3_1 and then discard them (data sink). Also, the FPGA keep writing data to FX3_2 (data source)  once FX3_2 is not FULL. The issue cannot be observed as well.

Therefore, I wonder if this issue is related to DMA fail to handle too many URDs then causing the DMA flag operates in abnormal behavior.

Or, can you advise any API can be called to discard those unread DMA buffer(s) once this issue happens?

Updates:

I captured the USB traffic. When this issue happens, the USB stopped transfer due to no ERDY return from FX3. May I ask if the host will stop the outstanding URD(s) unless it receives the ERDY from FX3?

200MB_URB4_Failed.png

Thanks,

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

Having more URBs may not cause DMA Flag abnormal behavior.

When the issue occurs can you verify whether the FPGA issued the correct number of reads after water mark is asserted. And also you can check when the issue occurred howmuch of the data FPGA has actually read out from FX3 buffer.

Regards,

Hemanth

Hemanth
0 Likes

Hi Hemanth,

We eventually figured out there is a bug in FPGA. We're fixing this bug and will test again. By the way, I have another question:

I declared a INTR endpoint which uses to report pin status to the host. Now, I created a DMA channel like below:

    /* Create a DMA MANUAL_OUT channel for the consumer socket. */

    /* Set the buffer size based on constants defined in the header file. */

    dmaCfg.size  = 8;

    dmaCfg.count = 1;

    dmaCfg.prodSckId = CY_U3P_CPU_SOCKET_PROD;

    dmaCfg.consSckId = CY_FX_EP_INTR_CONSUMER_SOCKET;

    dmaCfg.dmaMode = CY_U3P_DMA_MODE_BYTE;

    dmaCfg.notification = CY_U3P_DMA_CB_CONS_EVENT;

    dmaCfg.cb = CyFxMainAppDmaCallback;

    dmaCfg.prodHeader = 0;

    dmaCfg.prodFooter = 0;

    dmaCfg.consHeader = 0;

    dmaCfg.prodAvailCount = 0;

    status = CyU3PDmaChannelCreate (&glDmaHandle_IntrEp, CY_U3P_DMA_TYPE_MANUAL_OUT, &dmaCfg);

    if (status != CY_U3P_SUCCESS)

    {

        CyU3PDebugPrint (4, "CyU3PDmaChannelCreate failed, Error code = %d\r\n", status);

        CyFxAppErrorHandler(status);

    }

But I found that I can only commit a buffer unless the host read the content. This behavior causes the host cannot get the latest pin status but the previous one. Can I flush the occupied buffer or modify the content of the buffer and commit again??

Thanks,

Jason

0 Likes
Hemanth
Moderator
Moderator
Moderator
First like given First question asked 750 replies posted

Hi Jason,

Note that your dmaCfg.size should always be a multiple of 16 bytes.

1. For your problem, you can try increasing the dma buffer count. OR

You can try doing the following to flush before next commit as you asked:

------------------------

CyU3PUsbSetEpNak (EP_NUMBER, CyTrue);

CyU3PUsbFlushEp(EP_NUMBER);

CyU3PUsbSetEpNak (EP_NUMBER, CyFalse);

------------------------

The above way is not tested. So you need to validate it. Check whether you are able to commit after doing the flush.

Regards,

Hemanth

Hemanth
0 Likes