After spending close to two months going back and forth with an individual from Cypress Tech support with little success, I am hopeful that the community at large may be able to point us in the right direction.
I am a SW engineer responsible for all of the USB related software for a commercial test / measurement instrument that uses USB as it's primary internal communications bus. Our device consists of 2 physical boxes, in which we have integrated 3 TetraHubs. One is in the primary box on a custom board connected to a COTS PC, and two (both on one board) in the instrument specific physical box. The bus architecture has all three chained together, so:
BOX1( PC -> TetraHub1 -> ) ==> BOX2 (TetraHub2 -> TetraHub3)
Communications between the PC and TetraHubs are all done at high speed (480 Mbps). We have our custom hardware devices connected to TetraHub2 and TetraHub3, which are all full speed devices (12 Mbps). The PC runs an Ubuntu Linux OS, and the custom devices are based on Atmel AT91SAM7X parts, running FreeRTOS.
We have been actively working on improving signal quality and reducing noise, and this all works 99.9% of the time flawlessly. But every so often we see very low level communications failures coming from TetraHub3. When this occurs, either the entire chip appears to become disconnected (as per the OS), or we see various errors on all of the full speed downstream ports on TetraHub3. Swapping devices between TetraHub2 and TetraHub3 demonstrates that the problems stay with TetraHub3. (I.e. a device connected to TetraHub3 that is having comm issues works correctly when plugged into TetraHub2, and a device working correctly with TetraHub2 begins having comm issues when connected to TetraHub3.) When the HW engineer probes the full speed channels downstream of TetraHub3 we see that TetraHub3 intermittently appears to "freeze" when initiating communications with a downstream device. That is, when captured on an Agilent scope with the USB package installed, we see both D+ and D- toggling when transmitting a PID, but sometimes both lines stop, effectively creating bit stuff, CRC errors, etc. TetraHub3 simply stops toggling the D+ and D- lines. According to the scope, there is no noise on the D+/D- lines, the edges are sharp and beautiful, and it all just freezes. Then it recovers and re-transmits the PID and communications picks up again.
This has been a very intermittent problem, and some instruments appear to have more problems than others. Just this information has been very difficult to track down.
I guess I am looking for guidance or maybe an "a-ha" moment from someone. I do not think that these parts are defective, I am sure that we are doing something wrong that is causing the part to misbehave. But no one here has any idea here what that could be, and our Cypress Tech support contact is not being very clear as to what he is looking for. Could USB errors upstream of TetraHub3 (noise, etc.) cause it to freeze when sending data downstream, such that it produces Bit stuff and CRC errors? Has anyone else seen anything remotely like this? I guess that if someone would confirm that incorrect upstream comm could cause the part to misbehave in such a strange way I would be more receptive to the suggestions being made by tech support, but as of now he is denying that noise upstream could cause the part to malfunction as we are seeing.
And advice or suggestions would be greatly appreciated. Thanks in advance.