Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
lock attach
Attachments are accessible only for community members.
Len_CONSULTRON
Level 9
Level 9
Beta tester 500 solutions authored 1000 replies posted

I've attached a project that has DMA-driven function equivalents for:

  • memset() => dmemset()
  • memcpy() => dmemcpy()
  • memmove() => dmemmove()

It uses a single DMA channel and TD resource for these functions.

The reason for this project is to see how much faster the transfer of data can occur from these replacement functions.

Summary:  Transfer sizes 64 bytes or above (@ BUS_CLK = 72 MHz) are faster than the standard string library equivalents.

At a transfer size of 4095 bytes (maximum DMA TD transfer count), the transfer time is over 7 times faster!

Len_CONSULTRON_0-1624208556499.png

This project is written for a PSoC5 but can easily be converted to other PSoC families that have DMA resources.

I welcome any suggestions for improvements.

I'm also considering making this into an installable component.  Your thoughts.

Enjoy!

 

Len
"Engineering is an Art. The Art of Compromise."
1 Solution

/odissey1,

The termination of the DMA has the following definition:

CY_DMA_TD_TERMIN_EN - Terminate this TD if a positive edge on the trq input line occurs. The positive edge must occur during a burst. That is the only time the DMAC will listen for it.

The burst for strlen() is continuous for up to 4095 bytes.  Based on the datasheet, I don't see why a rising edge on TRQ during a continuous burst wouldn't terminate the TD.

Len
"Engineering is an Art. The Art of Compromise."

View solution in original post

0 Likes
4 Replies
lock attach
Attachments are accessible only for community members.
odissey1
Level 9
Level 9
First comment on KBA 1000 replies posted 750 replies posted

Len,

I get even higher numbers: for 4095 bytes, the speed-up ratio memset/dmemset = 8.17, and memcpy/dmemcpy = 10.4. Data are  attached.

I used the StopWatch component instead of toggled Pins to measure the elapsed BUS_CLK cycles, and the SerialPlot to output the results. Project is attached, all components are included into the project.

/odissey1

Table 1. Elapsed BUS_CLK for different size of the block (KIT-059, Creator 4.0)

size, memset, memcpy, dmemset, dmemcpy
1, 34, 34, 382, 375
2, 37, 42, 366, 342
4, 48, 56, 349, 340
8, 74, 92, 354, 365
16, 120, 156, 350, 339
32, 220, 290, 369, 360
64, 405, 552, 359, 364
128, 1034, 1080, 364, 387
256, 1560, 2136, 461, 455
512, 3097, 4260, 762, 682
1024, 6165, 8681, 990, 1002
2048, 12539, 17368, 1692, 1693
4095, 25265, 34707, 3092, 3342

Figure 1. Elapsed clocks for size 1, 2, 4..4095. Red - memset, Blue - memcpy, Green - dmemset, Fuchsia - dmemcpy

DMA_memcopy_01a_A.pngDMA_memcopy_01a_B.png

0 Likes

/odissey1,

Thank you for your results and modified program.  I'll have to check it out.

I was using a framing pulse and a scope to derive my timing results.   My memset()/dmemset() ratio is virtually identical to yours (8.10x) however my memcpy()/dmemcpy() ratio was closer to (7.12x).

In general, the actual DMA transfer was significantly faster.  The additional time for each function was the CPU compute time to set up the DMA and then wait for the DMA to complete.  The actual DMA transfer time is about 90% of time I logged for a large transfer (4095 bytes).  The CPU overhead for smaller transfers is more significant that is why 64 bytes is the practical limit (memcpy()/dmemcpy() ~ 1) for these DMA-drive functions compared to the standard functions.

On a related note:  For fun, I'm trying to implement a DMA-driven equivalent for the strlen() function.  I've got the DMA HW state machine to work for the comparison to the NULL char ('\0') at the end of the string.  However, when my digital comparator component finds it and places an active '1' to the TRQ input of the DMA, it does not terminate the DMA transfer.   I've never needed to work with the TRQ before therefore I'm assuming I setup something incorrectly.

If you're willing to take a look at it, I would appreciate it.  I posted my project at the following link:

Looking-for-proper-use-of-CyDmaTdGetConfiguration 

 

Len
"Engineering is an Art. The Art of Compromise."
0 Likes

Len,

I believe that DMA "trq" works only when it is HIGH "during" DMA "drq" signal rising front. It is rather hard to sync, not very useful, and I have not seen it being used.

 

I used the StopWatch component because is provides 1 BUS_CLK accuracy, whereis Pin_Write() takes about 1us. 

0 Likes

/odissey1,

The termination of the DMA has the following definition:

CY_DMA_TD_TERMIN_EN - Terminate this TD if a positive edge on the trq input line occurs. The positive edge must occur during a burst. That is the only time the DMAC will listen for it.

The burst for strlen() is continuous for up to 4095 bytes.  Based on the datasheet, I don't see why a rising edge on TRQ during a continuous burst wouldn't terminate the TD.

Len
"Engineering is an Art. The Art of Compromise."
0 Likes