I have tested transfer speed of SDHC controller that has dedicated DMA engine and Octal SPI port that uses general purpose DMA engines DMA or DMAC (in some chips, has cashing that accelerates the transfer). Maximal sustained reading speed from SD card at 100MHz clock was 44.766 MBps and writing 36.656 MBps, 1MB = 1,000,000 bytes. Maximal reading from Octal SPI at 50 MHz clock by DMAC was 32.576 MBps. The same speed testing done with DMA (without cashing) gives 1 MBps less. As you see, these values are in the same speed range as your memory to memory transfers. Detailed description can be seen in this thread: Re: CY8CPROTO-062-4343w SD card
I am also very curious to know what is the maximal achievable data rate in DMA transfers in PSoC 6.
If no reasonable suggestion about DMA acceleration will come, you may consider migrating to another platform as getting 64MBps sustained rate may be difficult in PSoC 6.
All the best, Alexei
Please refer to the DMA performance section in the respective device architecture TRM.
1D transfer takes 12 + n * 3 + m cyles to transfer. Where n is the no of data elements (256 in your case), m is the wait cycles incurred in the bus wait cycle due to arbitration. Theoretically, if there are no wait states (not preempted or lost arbitration) you can expect your transaction to complete in 780 cycles.
Please note that DMA operates in clk_slow domain. Make sure you have configured this is 100 MHz for Clk_Slow in case of higher throughput requirement.
Can you let us know how you calculated 9.14 MHz from your transfer?
I performed a 256 words transfer from SRAM to SRAM and was able to complete the transfer in ~10 uS, which is around ~1000 cycles for a 100 MHz clk.
In 1 D transfer, only the first transfer takes 15 cycles (12 + 3*1), the rest of the transfer consumes only 3 cycles per transfer assuming that your channel is the highest priority channel and it does not have additional wait cycles while accessing source/ destination buffers. T
the user "svme" wrote that 32-bit transfer was used. This means that two words (2 * 16-bit) are transmitted at once. And in accordance with the given formula the number of clocks should be 12 + 128*3 = 396 clocks. Your measurement gives 512 bytes *1e5 = 51.2 MBps. This is not bad. But theoretical maximum for large blocks is 4/3*100e6 = 133.3 MBps. The question is how to achieve this theoretical maximum.
The 32-bit transfer works for sure. If we shall take a look at octal SPI clock, we shall see that four clocks corresponding to four bytes are always together. Please see the pictures:
In this test SPI clock was 50 MHz. However, if we shall increase it up to 80 MHz, the data rate practically will not change because time interval between 32-bit packet starts will be the same about 121-136 ns.
What maximal data rate of SRAM to SRAM 32-bit transfer of large packets can you measure?
It seems to be that given "theoretical" rate is difficult to achieve.
Basically, there is a question whether given theoretical equation is correct.
SvMe_4718011 (and me) have performed 32 bit (one word = 4 bytes) transfer. Each word (4 bytes) takes 3 cycles to transfer. Therefore when you are transferring 256 words (256 x 4 = 1024 bytes), you need to use n = 256. This gives a total cycles as 780 cycles + wait states (12 + 256 * 3 + m). This give us around 131.2 MBps. (excluding wait cycles).
Practically I got around ~1000 cycles (measured using timer block). Data rate = 100*10^6 * 1024 /(1000 ) = 102.4 MBps.
And yes, if I had done a 16 bit transfer of 256 half words, I would arrive at ~ 51 MBps. Hence, 32 bit transfers have 4 times (or 2 times in case of 16 bit) better throughput than 8 bit transfer (or 16 bit transfer).
Now I increased the no of elements to 65536 words (65536 * 4 = 262144 bytes) to perform a 2D transfer (max possible size per descriptor per trigger ). Now, the formula is 13 + 65536 * 3 = 196621 cycles. Theoretically , this would give us a data rate of (100 * 10 ^ 6 * 262144)/196621= 133.32 MBps.
I did the same test using PSoC 6 to transfer 65536 words to perform 2D transfer. I got around 262177 cycles for this transfer. This would give us (100 * 10^6) * 262144/262177 = 99.98 MBps.
For the factors affecting the wait states, please read the DMA performance section in the architecture TRM. This is explained in detail.
thank you for the clarification. Perhaps, it would be better instead of "word" to use DWORD or uint32 to avoid confusion. As you see, your measurement has shown that RAM to RAM transfer of uint32 needs 4 clocks and not 3 how this is written in TRM. In any case, 3 or 4 clocks are close values and this does not make principle difference in most cases. Much more unpleasant is that Octal SPI DMAC transfer needs 12-13 clocks instead of expected 3-4. For other periphery such delays are not essential, but Octal SPI is the single interface that is nice for FPGA attachment. Thus, here high data rate is desired. Did you ever measure Octal SPI DMA data rates?
With best regards, Alexei
I found a solution to get it to about 7.8 uS for 256 Words(32Bit). I set the Priority of the DMA0 to the Highest with the Following Command:
"Cy_Prot_ConfigBusMaster(CPUSS_MS_ID_DW0, false, true, 0);"
Unfortunately the Priority is not a Parameter in the API, but is set to the higest by this function.
You can use the register PROT_SMPU_MSx_CTL[PRIO] register (x = 0 ,1, 2..) to set the priority levels of each of the bus master. Refer PSoC 6 register TRM for details on this register.
Refer device config header file (\Generated_Source\PSoC6\pdl\devices\psoc6\include\psoc6_01_config.h) for the master IDs to set appropriate priority level for each master.