I managed to get the application talking to the hardware somewhat correctly. (I have a few quirks, but I'm not chasing those right now.) We switched direction from another USB solution to a high speed device because of the frame timing. The system has very low data rates overall (most packets are only a few bytes at a time and not that frequent), but a full speed solution was not working. With the device I tested (DevaSys card), the delays in the system were 10-20 times slower than with the old ISA interface. Whe I looked at bus timing, the frames were 4ms.
I researched frame timing and found that high speed is supposed to have 125us subframes. With that kind of timing, I should be able to get the system working about the same speeds as ISA with possibly a few non-critical areas being a little slower.
In the initialization code the software does a send and recieve of one byte through a range of 256 possible addresses in the external chassis (it's looking to see what hardware is installed in the chassis). It does this in a tight for loop that has about 15 us per I/O call when using the ISA card. This is not a critical part of the program and I left it as is to compare timing between interfaces (I plan to move the polling to firmware eventually and just read all 256 bytes in one Bulk transfer, but that's phase II).
Looking at the traffic with a logic analyzer, I'm seeing per transaction timings of 3-4ms, not 125us. What could be causing the timing to be 24-32 times slower than expected? I find it interesting that the bus timing is almost identical to what I saw with the full speed device we rejected.
I am doing all I/O via control endpoint transfers at the moment. Many times I need to write a couple of bytes and then immediately read one or two. The bidirectionality of the control endpoint enables me to combine these into one USB call.
What is causing this and how to I fix it?