In this post, I am showing the solution I worked out utilizing the PSoC 5LP UDB control registers and forced UDB placement to enable parallel 32-bit writes to GPIO pins which are not contiguous with a single write instruction. This can be used for writing 8 to 32 pins in parallel, but I am showing a 32-bit example. I found this to be extremely helpful and wanted to share it with the community (the project is attached).
I needed a way to use the PSoC's CPU to write out 32-bits on the GPIO in parallel at a rate of about 16MHz. The trick to accomplishing this, and it should not be done unless you understand what data you are potentially overwriting in memory, is to force the UDB placement of control registers into UDBs located next to each other in memory. Writing the GPIO through the UDBs enables more pins to be accessed at one time than you could do using the CPU. The "map" I used to determine which UDBs to force the control registers to be in was provided by reddit user eric_j here. The post disputes the accuracy of the map, but for the UDBs I chose, it was accurate. The UDB placement I used can be seen in the image below:
These placements correspond to the control registers in the following schematic:
What this placement does is leverage the fact that the 8-bit control registers for these UDBs are contiguous in the PSoC memory.
These are the address of the "[component name]_Sync_ctrl_reg__CONTROL_REG" pointed to in the cyfitter.h file.
Bits_0_to_7 address = 0x4000647A
Bits_8_to_15 address = 0x4000647B
Bits_16_to_23 address = 0x4000647C
Bits_24_to_31 address = 0x4000647D
By writing an unsigned 32-bit value to the base address of the first 8-bit register will overflow the additional 24-bits into the next 3 8-bit registers resulting in a parallel 32-bit write to the connected GPIO pins. A similar configuration can be used to read when using status registers. Remember all the possible issues that come with abusing pointers and overwriting memory. Below is the code required to do this.
Note: I have had write outputs higher than 20MHz when using optimized code where in-line literals are written and the for-loop is unrolled.
I am no expert in the field of embedded systems and PSoCs, so there may better ways to do this, but this was the only solution I could come up with and worked amazingly well. I would love to hear any thoughts, criticisms, improvements, or applications that you can think of for this!