I have a design based on CY7C68013A and Lattice iCE40HX8K FPGA.
The FX2 is configured for internally sourced IFCLK, 30 MHz output, 8-bit FD bus, and FIFO slave mode; AUTOIN/AUTOOUT are turned off, and only EP6 BULK IN endpoint is used, configured as 4x512 packets.
All logic on the FPGA is internally clocked from IFCLK, and each FX2 I/O pin is registered (within the I/O buffer, not the fabric, i.e. there is no skew between different bitstreams or between different FPGA pins), such that all control signals from FPGA change simultaneously. Moreover, for the purposes of this question, SLOE=1, SLRD=1, FIFOADR=0b10, FD is always an FX2 input, and only FLAGC is used; so, the FPGA is always driving the FX2, and the only data lines between FX2 and FPGA that actually change are FD, SLWR, PKTEND and FLAGC, with the others being inactive and/or unused.
The host software is queueing EP6 BULK IN such that there are always sufficient buffers on the host to retrieve the data.
The FPGA internally contains a FIFO that is filled with bursty data at the rate of approx 5 Mbytes/s (simulated video capture). Each burst of data contains 482 bytes: frame counter, row counter, and 160 RGB pixels, with the actual data being a test pattern, consisting of repeats of 00 00 00 01 01 01 ... 1e 1e 1e 1f 1f 1f 00 00 00 01 01 01 ... and so on. The data is generated entirely on the FPGA and fully synchronous to IFCLK, which is the only clock domain in the system.
The FPGA contains an internal counter that asserts PKTEND after 482 bytes are written. The FPGA uses the following state machine:
In this diagram, FLAGC is configured as the inverse of EP6IN FULL flag, DATA is the output latch of the internal FPGA FIFO, and READABLE is high when the internal FPGA FIFO contains any data. Each time the counter is incremented, the FIFO is advanced by one entry.
Now, there are two cases. If the state machine transition described in red is not present (eliminated from the FSM entirely), I receive all data correctly. On the other hand, if the transition in red is present, I receive almost all data correctly, but occasionally, a few dozen of times per second, a byte would arrive corrupted. Some of the possible corruptions are, in expected → actual format: 18→01, 00→09, 08→11, 16→1f, 0a→13, 1e→07.
For example, see the packet captures below. First, how a packet should look like:
Note that at offset 0x60..0x61 there is a header that is expected to change. Everything else should stay constant.
Second, how a corrupted packet looks like:
Note that at offset 0x7d, a byte that should be 0x09, changed into 0x12.
Next, I've tried to correlate the corruption with the SLWR edges. To do this, I've changed the FSM to logically OR the byte that is being written into FX2 FIFO just before the red transition is taken with 0x40. The corrupted byte is always the very next byte. For example, here is the expected pixel data with the added logical OR of 0x40:
As it can be seen, the state machine is taking the red transition each time it has written a pixel into the FX2, which is the expected behavior, as the pixel data is generated slower than it can be written into FX2. Now, consider the pixel data with a glitch:
Note that at offset 0x57, a byte that should be 0x1c (0x5c would be also valid, although unlikely) is changed into 0x05.
For reference, this is an example of a corrupted frame from the video stream, showing that the corruption is not random, but in fact quite regular:
Therefore, it looks like in very rare circumstances, perhaps once per several hundred events, the first byte queued into the FX2 FIFO is corrupted.
Of course, an obvious suspect is a violation of the FX2 timing constraints. However, this does not appear to be the case. The FPGA has a clock-to-out time of 9..10 ns, relative to IFCLK at the FPGA pad. The FX2 has 11 ns of setup time requirement, and 0 ns of hold time requirement with internally generated IFCLK. Therefore, the timings are satisfied, with a very large safety margin. See the diagram below:
The next reasonable guess would be metastability inside the FPGA, caused by the FLAGC input from the FPGA not meeting setup/hold constraints. However, the FLAGC input can be completely removed, as in this application, USB can service the FX2 much faster than the data is generated, and the condition where all four 512-byte endpoint buffers are full essentially never happens. Since with FLAGC being removed, there is no input to the FPGA anywhere (other than IFCLK), this cannot be caused by metastability.
Anyway, I've spent several months at this point trying to debug this issue. Does anybody have any ideas of what's going on? It has to be some kind of timing condition that involves only IFCLK, SLWR and FD, but I'm at a loss as to where to look next.