I am working on a design with the synchronous Slave FIFO interface with the FX3 SuperSpeed Explorer Kit. However, I have problems with the latency issues.
My application sends 36-byte instructions to a 2048-byte FIFO on an FPGA; the Slave FIFO interface reads from the FX3 at 100 MHz and writes the data to the FIFO, but the FPGA reads one instruction at every clock cycle of a 0.5 MHz FPGA clock. After some processing, the FPGA sends results back with a "timestamp" of sorts based on the 0.5 MHz clock on my FPGA, so I can see when instructions are received by the FPGA. I am using the Slave FIFO interface to communicate with my host computer, which is a Wandboard Quad-Core single board computer runnign Ubuntu 14.04 with USB 2.0. connection.
I'm trying to reduce the latency of sending instructions from my host computer to the FX3 because the FPGA isn't receiving data fast enough from the FX3.. On my FX3 implementation, I have 45 2048-byte DMA buffers to be read from by the FPGA. Because of the size of instructions, I'm sending chunks of instructions together at the same time, filling up a buffer with 2016 bytes of data. However, when testing, I find that there is significant delay by 100 milliseconds when switching between DMA buffers. From the time of my last instruction at the end of one DMA buffer to the time of the first instruction at the beginning of the next DMA buffer, based on results, even though they were spaced 7 clock cycles apart, the results show that they were received 1000 cycles apart.
I'm trying to reduce the delay down to microseconds. I tried switching to a DMA AUTO configuration, changing the endpoints to be INTERRUPT endpoints rather than BULK endpoints, but I can't seem to get the latency down. I'm using LIBUSB to send data from my Wandboard using ASYNCHRONOUS data transfers, but what else can I do to reduce the latency? Am I just running into the limitations of USB 2.0 transfer? I can't increase the burst Transfer to more than 1, but what else can do to reduce the latency?