Because the DMA buffer is 16KB, you send 2KB from Host to FX3, FX3's DMA engine consider the data is not enough to push to P-port, so DMA engine wait the dma buffer is full as you see "8-package data" was need additionaly.
You could send a zero-package after 2KB, and DMA engine consider the data is a-short package, and not wait DMA buffer is full, push the data to p-port immediately.
A effective way to solve the problem is set DMA_Size is 1KB (equal a standard package of USB3.0), and DMA's Buffer count is 16.
The effectiveness of transfer is not high as your current setting. But can avoid the problem.
The second way to avoid your problem is "set DMA_Size is 2KB" and "DMA's Buffer count is 8".
Thanks for the reply and to confirm the behaviour. I'm looking at libusb to see if that's something known. Your second suggestion does not work for us unfortunately since that would drop the USB throughput too much (I've followed the AN to optimize the USB data bandwidth for the current configuration).