A bit more digging reveals that the status/control registers are inherently tied to the 16-bit concatenation setting of the UDB which they are allocated into. Therefore a placement_force workaround must also force the datapath allocation so as to avoid the unwanted working register mode.
Furthermore a common register wait-state setting (CYREG_BCTL[0-1]_WAIT_CFG) is apparently employed for either of the two UDB banks, both of which add two read cycles in my present configuration.
How are these wait-state settings determined and how do I optimize them? Dropping the system down to a single 1 MHz clock shaves off one wait-state without reaching zero. Presumably logic requiring fast bus access should be carefully allocated into one of the UDB banks, and care be taken so as not to inadvertently introduce wait-states by accident.
The subject is rather deep, rising cypress support case would be appropriate.
One way to have "wider" register is to exploit a FIFO. See the updated component in this thread
And this may be relevant as wellSorry, I can't help beyond that...
Thank you, I think I'll give the support a shot if I can't figure things out by myself.
A parallel datapath "wrapper" might work here. Forwarding data to/from the FIFO would introduce an extra cycle of delay in either direction. However the status register is already implicitly used for parallel input mode of the datapath and so could probably be read directly in software.
To be honest I get the feeling that the documentation and tools to optimize a PSoC design seem fairly spotty, with a general focus more on ease-of-use.