7 Replies Latest reply on Oct 22, 2019 6:14 AM by UrPl_1236626

    PSoC6 Memory and DMA Performance

    UrPl_1236626

      Hi,

       

      I could not find in the TRM much information about PsoC6 memory and DMA performances, more precisely:

       

      1. How is memory shared among the two processors, and the DMA,

      2. How many wait states are expected at what speeds, or are there different regions between which DMA can transfer data between the memory and peripherals without stopping the CPU,

      3. And regarding the NV (flash memory) only found wait states definition in the data-sheet to be 3 for 100 MHz peripheral clock and 2 @ 50 MHz per clk, so I assume max flash speed is 33 MHz; thus for 66 MHz per_clk flash wait states should also be 2

       

      Thank you, Uros

        • 1. Re: PSoC6 Memory and DMA Performance
          BragadeeshV_41

          Hi UrPl_1236626,

           

          1. Both the cores share the same memory region as given in the following table:

          If you want to know the exact locations in which each of the cores store their program data, you can refer to the linker script files and look for the memory regions defined there. Adding screenshots of the same.

          Cm0+

          CM4:

           

          All the peripheral registers can be accessed by both the cores using the AHB bus. If both the cores try to access the same memory location then it will lead to collision and an arbitration takes place. You can implement semaphores using IPC to lock access to resources shared by both the cores.

          We also have an app note and associated code example to showcase IPC semaphore/mutex using PSoC 6

          App note: https://www.cypress.com/documentation/application-notes/an215656-psoc-6-mcu-dual-cpu-system-design

          CE:  https://www.cypress.com/documentation/code-examples/ce223549-psoc-6-mcu-ipc-semaphore

           

          You can also define custom sections in linker scripts to share a memory location between two cores. The following thread discusses the same:

          Re: How to set up shared memory between CM4 and CM0p

           

          You can restrict access to memory and peripheral regions using the memory protection units available in both the cores. You can refer to the architecture TRM for more information on protection units.

           

          2. The wait states depends on the latency of the bus such as arbiter (when both CPU and DMA access the same peripheral/ memory locations at the same time) and it is purely application specific. Could you please elaborate the second part of your question? DMA can transfer data between memory and peripheral locations without the interference of the CPU. The CPU comes into picture only for loading the descriptor information into the associated registers for the first time or while trying to read the DMA status and control registers at the end of transfer. In all other steps, CPU can perform other tasks when the DMA is executing. But if they both try to access the same resource (location), then bus arbitration comes into picture.

           

          3. I'm not sure if we can directly derive such an inference from this information. These data can't be speculated unless specified by char. I'll check on this internally and get back to you.

           

          Regards,

          Bragadeesh

          1 of 1 people found this helpful
          • 2. Re: PSoC6 Memory and DMA Performance
            BragadeeshV_41

            The wait states depends on the frequency and voltage range that we operate in. Also the Taa time ( access time from address) changes depending on the device.  Also you are right, if the operating voltage is 1.1 V,  at 66 MHz the wait states is 2 .

             

            Regards,

            Bragadeesh

            • 3. Re: PSoC6 Memory and DMA Performance
              UrPl_1236626

              Hi Bragadeesh,

               

              thank you for answers. My question was more directed toward wait-states a CPU or DMA sees when doing transfer. For instance:

               

              • is all the memory a kind of single segment, so having single data transfer port, or is memory divided into segments which operate in parallel,
              • if it is single means, that both cores + DMA fight for the same memory bus, so then there are plenty of wait-states and virtually speed drops significantly,
              • if it is multiple, where are boundaries of such memory regions to place data properly and to allow parallel operation

               

              The other thing we cannot figure out is what could stall the highest priority DMA channel in the single-core PSoC6 for 5 us. In the screen-shot below from the top is FIFO empty of the UDB signal, which 2nd pulse is occasionally prolonged waiting for DMA to be served, yellow is output clock of bi-mode master transfer, and green is slave-select, which returns high due to insufficient data in the fifo).

              Unknown-Source-of-Jitter.jpg

              • CPU speed 126 MHz and clk_per = 63 MHz.
              • 17 DMA channels, only one of them is higher priority 0, and the rest have priority 3, where each DMA channel is transferring 4 bytes from UDB FIFO at once,
              • and the higher priority channel 0 receives is facing jitter of 5 us but if and only if UART is enabled, but doing nothing (no rx/tx transfer); if it is completely disabled we never see the 5 us jitter (and this UART does not use any DMA channels),
              • Is there anything in the PSoC6 that can execute above the DMA channels and could introduce such a long jitter, is there any PSoC6 micro-code executed in the background.

               

              So in general we find memory bus multiplexing inside of the PSoC6 a bit foggy and unpredictable for use in real-time systems. So more detailed information about internal memory bus structures, segmentation, singe/dual memory ports, arbitration with wait states, .. would be nice to have.

               

              Kind regards,
              Uros

              • 4. Re: PSoC6 Memory and DMA Performance
                BragadeeshV_41

                Hi,

                 

                Is it possible for you to share a stripped down version of your project showing the DMA and UART issue? We will try to reproduce the issue from our side and check what's the root cause of this issue.

                 

                Regards,

                Bragadeesh

                • 5. Re: PSoC6 Memory and DMA Performance
                  UrPl_1236626

                  Hi,

                   

                  We will distribute DMA channels among the two DMAs.

                   

                  Can we assume all Masters on AHB are arbitrated in Round-Robin mode?

                   

                  Best regards,
                  Uros

                  • 6. Re: PSoC6 Memory and DMA Performance
                    BragadeeshV_41

                    Hi UrPl_1236626,

                     

                    By default, all the bus masters in the AHB-lite bus will have the same priority of 3. An arbiter  will occur in case of collisions in round robin fashion. Each bus master has 2-bit priority level (“0” is the highest priority level and “3” is the lowest priority level). The priority of the DMA master can be increased by changing the priority in the MSx_CTL register. Can you please let us know the exact device you are using in your application?

                     

                    Regards,

                    Bragadeesh

                    • 7. Re: PSoC6 Memory and DMA Performance
                      UrPl_1236626

                      Thank you Bragadeesh for your response.

                       

                      We're using single core PSoC6 CY8C6136BZI-F34 and with all this info we are able to control the latency.

                       

                      Best regards,
                      Uros