PSoC6 I/O toggle speed of M4 limited by clock speed of M0

DoDe_295886 · ‎Jan 23, 2020

CY8C6347BZI-BLD53

Like it says... My M4 clock is running at 150MHz, and I am doing a direct I/O toggle of a GPIO port pin using the M4 core. It only toggles an output pin at a max of about 10MHz. When I change the speed of the M0+ clock, the I/O toggle speed changes. This is done by changing the divider of the clock at the Clk_Slow, leaving the Clk_Peri unchanged. I've tried putting the M0 core to sleep... no change. If I really slow down the M0, say to 2MHz, the M4 toggle speed slows down to 500kHz. Now, a single op-code output instruction executed by the M4 core, running at 150MHz takes 150 M4 clocks, 50 Clk_Peri, or 2 M4 clocks. Seems slow for a 150MHz core to execute a single op-code....

Why does the speed of the M0 core have anything to do with the speed of the M4 I/O? And what exactly does this do to things that I will be wanting to do in the future, like a multi-tasking system? Does the M4 core get completely stalled by the slow speed of the I/O? Does it initiate the I/O and then continue processing until it comes to another I/O, only stalling then? Does it stall until the M0 catches up with whatever it is doing? If both are using separate I/O, does one stall the other even worse? A nice coherent description of how the M4, M0+ and peripherals interact would be helpful, but endless searches through the documentation has found no help.

RodolfoGL · ‎Feb 05, 2020

Hello,

Sorry for the delay. We were checking internally for the reasons why M4 is not toggling the I/O as fast as before dropping the "Clk_slow".

It turns out the MMIO AHB-Line bus Interface, which is used to access the I/O registers is clocked by the "Clk_slow". That explains why the M4, even running at 150MHz, is not able to toggle at 10MHz when the Clk_slow drops drastically.

Note that you want to keep the M0+ running at higher frequency for other reasons. Some system calls are only executed on M0+, so if you drastically slow down, it will affect their performance. For instance, any system calls related to security and flash writes. The DMA engine is also clocked by the "Clk_slow".

We will update some key documents to point this out. Thanks for sharing your findings.

View solution in original post

RodolfoGL · ‎Jan 23, 2020

The PSoC 6 part you are using has only one SRAM controller, which means the CM4 an CM0+ need to share the SRAM controller. If one processor is accessing anything in the SRAM, the other one needs to wait till it completes the transaction. By slowing down the CM0+, it would affect the time it takes to access something in the SRAM.

DoDe_295886 · ‎Jan 24, 2020

Except, op-code fetches are from flash, not SRAM. Toggling a GPIO output bit is also not SRAM. Even if so, throttling the 150MHz M4 core to the speed of the M0+ core because of alternating op-code fetches (or whatever) would be a huge waste. If op-code fetches of the fast core are limited by the speed of the slow core (even when sleeping), then this should NOT be advertised as 150MHz when the MAX speed of the slow core is 50MHz (and it would be common to operate the slow core much more slowly). The very point of having a fast core is so that it can operate....... FAST.

Also, I would assume that since the M0+ core is a von Neumann architecture, and the M4 core is a Harvard architecture, then the op-code fetches are from separate memory spaces (a requirement of the Harvard architecture). The GPIO immediate output instruction being executed by the M4 core does not access Ram at all, and only fetches from program memory that is most likely totally unavailable to the M0+ core. Throttling one, based on the other, seems both pointless and completely counterproductive. Thanks for the effort, but despite somebody marking your answer as "Correct Answer", and it may be a good answer to some other question, it does not seem to answer *this* question at all.

DoDe_295886 · ‎Jan 24, 2020

More information...

002-18270_PSoC_6_MCU_with_Bluetooth_LE_CY8C63x6_CY8C63x7_Registers_Technical_Reference_Manual.pdf

Page 30

Shows 3 separate SRAM regions and 3 separate SRAM controllers. SRAM contention between a von Neumann architecture core and an Harvard architecture core should not be an issue. Limiting the M4 core to the current operating speed of the M0+ core would be a very bad design decision and is probably NOT what is happening. The question remains,... what *is* happening that causes this to happen, and how do I fix this crippling problem?

DoDe_295886 · ‎Jan 27, 2020

More info...

Seems like the core is actually still running at 150MHz, but the I/O is limited by the speed of the "Clk_Slow" of the M0+ core. Various combinations of clock rates for "Clk_Fast", "Clk_Peri", and "Clk_Slow" shows that there is some important part of the peripheral system that is clocked *NOT* by "Clk_Peri" but by "Clk_Slow". If "Clk_Fast" and the M4 core is set at 150MHz, and the "Clk_Peri" is set at 75MHz, and "Clk_Slow" is set at, for example, 1MHz, and the M4 core is only executing the 3 machine instructions necessary to increment a register and output it to a port, it will only toggle the port once per 1MHz (500kHz for a full cycle square wave). Multiple increments of the register shows that the M4 core is still actually running at 150MHz, but every time it comes to the output instruction, it stalls for the remaining part of the clock on the "Clk_Slow" M0+.

So, despite what the data-sheets seem to say, the fact is that at least a primary part of the I/O system, that required for the core to actually output a value to a port, does *NOT* run on the "Clk_Peri" but instead runs on the "Clk_Slow". So, to run on low power (M0+ core), and wake up occasionally to do something REALLY fast with the M4 core, I can not just run "Clk_Fast" and "Clk_Peri" relatively fast, and run the M0+ core VERY slow to save power while the M4 core sleeps. Running the M0+ core at 75 MHz just to see, for example, if the Sun rises, seems like an awful waste.

This seems to be a major mis-design of the PSOC 6, and certainly an error in the already somewhat under-documentation of this part of the system. Somebody show me if I am wrong on this, or better yet, what can I do to make it right.

End of my input. Cypress doesn't pay me enough for the time I've spent on this already. It would be so much easier to just put it in the docs.

RodolfoGL · ‎Feb 05, 2020

Hello,

Sorry for the delay. We were checking internally for the reasons why M4 is not toggling the I/O as fast as before dropping the "Clk_slow".

It turns out the MMIO AHB-Line bus Interface, which is used to access the I/O registers is clocked by the "Clk_slow". That explains why the M4, even running at 150MHz, is not able to toggle at 10MHz when the Clk_slow drops drastically.

Note that you want to keep the M0+ running at higher frequency for other reasons. Some system calls are only executed on M0+, so if you drastically slow down, it will affect their performance. For instance, any system calls related to security and flash writes. The DMA engine is also clocked by the "Clk_slow".

We will update some key documents to point this out. Thanks for sharing your findings.

PSoC6 I/O toggle speed of M4 limited by clock speed of M0

PSoC6 I/O toggle speed of M4 limited by clock speed of M0

Re: PSoC6 I/O toggle speed of M4 limited by clock speed of M0

Re: PSoC6 I/O toggle speed of M4 limited by clock speed of M0

Re: PSoC6 I/O toggle speed of M4 limited by clock speed of M0

Re: PSoC6 I/O toggle speed of M4 limited by clock speed of M0

Re: PSoC6 I/O toggle speed of M4 limited by clock speed of M0

Re: PSoC6 I/O toggle speed of M4 limited by clock speed of M0