- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've attached a project that has DMA-driven function equivalents for:
- memset() => dmemset()
- memcpy() => dmemcpy()
- memmove() => dmemmove()
It uses a single DMA channel and TD resource for these functions.
The reason for this project is to see how much faster the transfer of data can occur from these replacement functions.
Summary: Transfer sizes 64 bytes or above (@ BUS_CLK = 72 MHz) are faster than the standard string library equivalents.
At a transfer size of 4095 bytes (maximum DMA TD transfer count), the transfer time is over 7 times faster!
This project is written for a PSoC5 but can easily be converted to other PSoC families that have DMA resources.
I welcome any suggestions for improvements.
I'm also considering making this into an installable component. Your thoughts.
Enjoy!
"Engineering is an Art. The Art of Compromise."
Solved! Go to Solution.
- Labels:
-
Code examples
-
DMA
-
string library
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/odissey1,
The termination of the DMA has the following definition:
CY_DMA_TD_TERMIN_EN - Terminate this TD if a positive edge on the trq input line occurs. The positive edge must occur during a burst. That is the only time the DMAC will listen for it.
The burst for strlen() is continuous for up to 4095 bytes. Based on the datasheet, I don't see why a rising edge on TRQ during a continuous burst wouldn't terminate the TD.
"Engineering is an Art. The Art of Compromise."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Len,
I get even higher numbers: for 4095 bytes, the speed-up ratio memset/dmemset = 8.17, and memcpy/dmemcpy = 10.4. Data are attached.
I used the StopWatch component instead of toggled Pins to measure the elapsed BUS_CLK cycles, and the SerialPlot to output the results. Project is attached, all components are included into the project.
/odissey1
Table 1. Elapsed BUS_CLK for different size of the block (KIT-059, Creator 4.0)
size, memset, memcpy, dmemset, dmemcpy
1, 34, 34, 382, 375
2, 37, 42, 366, 342
4, 48, 56, 349, 340
8, 74, 92, 354, 365
16, 120, 156, 350, 339
32, 220, 290, 369, 360
64, 405, 552, 359, 364
128, 1034, 1080, 364, 387
256, 1560, 2136, 461, 455
512, 3097, 4260, 762, 682
1024, 6165, 8681, 990, 1002
2048, 12539, 17368, 1692, 1693
4095, 25265, 34707, 3092, 3342
Figure 1. Elapsed clocks for size 1, 2, 4..4095. Red - memset, Blue - memcpy, Green - dmemset, Fuchsia - dmemcpy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/odissey1,
Thank you for your results and modified program. I'll have to check it out.
I was using a framing pulse and a scope to derive my timing results. My memset()/dmemset() ratio is virtually identical to yours (8.10x) however my memcpy()/dmemcpy() ratio was closer to (7.12x).
In general, the actual DMA transfer was significantly faster. The additional time for each function was the CPU compute time to set up the DMA and then wait for the DMA to complete. The actual DMA transfer time is about 90% of time I logged for a large transfer (4095 bytes). The CPU overhead for smaller transfers is more significant that is why 64 bytes is the practical limit (memcpy()/dmemcpy() ~ 1) for these DMA-drive functions compared to the standard functions.
On a related note: For fun, I'm trying to implement a DMA-driven equivalent for the strlen() function. I've got the DMA HW state machine to work for the comparison to the NULL char ('\0') at the end of the string. However, when my digital comparator component finds it and places an active '1' to the TRQ input of the DMA, it does not terminate the DMA transfer. I've never needed to work with the TRQ before therefore I'm assuming I setup something incorrectly.
If you're willing to take a look at it, I would appreciate it. I posted my project at the following link:
Looking-for-proper-use-of-CyDmaTdGetConfiguration
"Engineering is an Art. The Art of Compromise."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Len,
I believe that DMA "trq" works only when it is HIGH "during" DMA "drq" signal rising front. It is rather hard to sync, not very useful, and I have not seen it being used.
I used the StopWatch component because is provides 1 BUS_CLK accuracy, whereis Pin_Write() takes about 1us.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/odissey1,
The termination of the DMA has the following definition:
CY_DMA_TD_TERMIN_EN - Terminate this TD if a positive edge on the trq input line occurs. The positive edge must occur during a burst. That is the only time the DMAC will listen for it.
The burst for strlen() is continuous for up to 4095 bytes. Based on the datasheet, I don't see why a rising edge on TRQ during a continuous burst wouldn't terminate the TD.
"Engineering is an Art. The Art of Compromise."