In the main function, it seems some of the conditions given in datasheet is not met.
"Note that one component clock pulse is required to start the component logic after this function is called."
And there is no CyDelay given between xx_WriteData(), load/clock signals etc. This is required since the API calls, which is done by CPU will be very fast for Shift register to handle.
Use cystatus xx_WriteData() to check if the writedata is successful.
You can find a code example here: CE95372 - Shift register with PSoC 3/5LP | Cypress Semiconductor