Why is the pin toggling frequency so low?

PiWy_2406846 · ‎Jun 21, 2017

I have a FreeSoc2 development board with the 5888 part, MASTER_CLOCK comes from an external 24MHz crystal and is bumped up to 64MHz by the PLL. BUS_CLK==MASTER_CLK. Release build with LTO etc. enabled, optimization goal set to speed. Debugging a soft 3-wire SPI implementation I ended up with the following snippet:

    for(;;) {

        Soft_SCLK_DR |= (1 << Soft_SCLK_SHIFT);
        Soft_SCLK_DR &= ~(1 << Soft_SCLK_SHIFT);
    }

The output frequency on the Soft_SCLK pin is 4MHz -- I expected it to be in the 15-20MHz ballpark. So why is it so slow?

PiWy_2406846 · ‎Jun 21, 2017

It seems that my measurements are more or less consistent with the Cypress' own application notes:

http://www.cypress.com/file/45381

"Toggle GPIOs Faster with Data Registers", please refer to figures 28 and 29. The API toggler produces a waveform of 2us period, while a loop exactly equivalent to my code produces a period of 530ns. I don't see their clock settings to be mentioned, but my period is 250ns for a 64MHz clock.

So again, what mechanism causes even the faster toggler to be so slow? Please note that I am not asking about the fastest way to toggle a pin, because I can accomplish that easily with a UDB, I just want to fill a gap in my understanding of the circuits and activities behind the software pin access.

Bob_Marlowe · ‎Jun 21, 2017

Toggling a pin your way involves CPU action only. So the SYCLK speed will have the most influence on execution time, the needed instruction sequence is next.

I would suggest you to set the build mode from "Debug" to "Release" which in turn will cause some code optimizations.

Nonetheless the toglle-testing is more or less meaningless, you do not have any cpu-time left to do something else, the pin will never give a 50% duty cycle (due to the needed branch instruction) . So what does it help you or make any difference when you find a method to toggle a pin faster ?

Bob

PiWy_2406846 · ‎Jun 21, 2017

Bob,

as already mentioned, the numbers come from a release build with all the optimization options bumped to their highest levels (even including LTO, which should not change anything in this particular test case, BTW). The 64-MHz 32-bit ARM CPU is able to produce exactly 4M pulses per second, which is equal to the abilities of an ATMega168 clocked at 16MHz. In the Atmel's case the performance exactly matches the calculations based on the opcode definitions, in the case of an ARM it is beyond my current abilities of grepping the tons of documentation. The math clearly shows that this loop on a PSOC5LP needs 16 cycles per iteration, which is kind of a lot. I'd like to know where do these cycles go, hence the posted question.

The real problem comes from my research on whether a dedicated 3-wire SPI bus should be handled by a fully software bit-banging, or a UDB-based accelerator (albeit handicrafted, as the SPI blocks from the Cypress library have insane resource footprints, not to mention the UARTs). My fear was that this implementation will be too fast, not too slow and how exactly I should manage the setup/hold times. It turns out that the software implementation will be so unexpectedly slow that no waitstate management will be necessary. This is a dedicated system bus with real-time latency requirements (namely, RTC to external atomic clock reference synchronizer with 15us accuracy), so the CPU would otherwise poll the UDB. Currently the software implementation wins, because it consumes no precious UDB resources and is sufficiently slow for the slave chip to handle.

odissey1 · ‎Jun 21, 2017

Piotr, max frequency obtainable by pin toggling in software is BUS_CLOCK/5 = 64MHz/5=12.8MHz. https://vimeo.com/65092394

Why is the pin toggling frequency so low?

Re: Why is the pin toggling frequency so low?

Re: Why is the pin toggling frequency so low?

Re: Why is the pin toggling frequency so low?

Re: Why is the pin toggling frequency so low?