Probably you can find the information in the ARM Cortex-M3 Processor Technical Reference Manual from the URL below.
But I'm afraid that the delay using loop can be preempted by other interrupt(s) and may not be very accurate.
Depending on the requirement of granularity, I would recommend you to go with
either Hardware Timer (for fine timing) or SysTick for ms order or above.
I believe "__asm("nop");" is the instruction to use. RISC architectures in the the past besides being "Reduced Instruction Set Computing" claimed that each instruction completed on one CPU clock cycle. I believe the ARM CPU of the PSoC5 does just that.
Additionally the CPU clock is the BUS_CLOCK which is derived from the MASTER_CLOCK. Check your DWR/Clocks configuration which divisor is used on the MASTER_CLOCK to provide BUS_CLOCK.
Cypress provides a function called CyDelayUs(uint16 microseconds). I understand that they compute the needed "NULL" instructions to loop on based off the BUS_CLOCK.