PSoC6 UDB DMA

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Anonymous
Not applicable

Hi All,

I'm having difficulty running DMA to UDB FIFO's reliably. The core/PeriBus is clocking at 98MHz and the UDB ALU's are running at PeriBus/2.

Initially the DMA was over-running the UDB FIFO's as there was didn't appear to be any throttling of the DMA (no automatic routing of FIFO full/nempty signals to the DMA engine).

I have routed the f0_bus_stat and f1_bus_stat signals as DMA request signals (f0 is for incoming data and f1 for outgoing data). The fifo nfull signal activates when there is at least 2 spaces in the FIFO and the nempty activates if there are at least 2 bytes to be read. I've set these because the fifo's were still clearly being over/under flowed by the DMA.

This has led me to the belief that the Fifo Req signals are either too slow or extended such that a request for two DMA transfers actually results in more transfers occuring.

I managed to get some stability by reducing the DMA to only transfer one byte per request (even though the levels are set at 2). This appears to allow enough time for the bus status signal to update when the UDB is running at low throughput. However at high traffic levels, the DMA appears to not be able to keep up, presumably because of the overhead involved in each transfer request.

I've read the Component Author Guide chapters 7 and 11, but it would be easier to debug if there was more information about how the bus_stat signals are clocked (to a similar level as the blk_stat signals).

I'm considering trying to condition the request signals into a psuedo-pulsed format, but suspect that this is risky.

I'm aiming for a sustained 6.5MBytes/sec transfer rate for a block of 512 bytes.

I would welcome any advice or experience on reliably running DMA's to UDB FIFO's and any additional considerations I should take into account.

Regards

Zig

0 Likes
1 Solution

Zigs,

Can you post your simple project to the post?

The intra/inter spoke is not applicable for PSoC 6 devices. The way to improve DMA performance is by using higher clock and lesser dimension (i.e. a 1D 256 byte transfer is faster than a 2D 2x128 bytes transfer).

The WAIT_FOR_DEACT waits for deactivation (low - deactivation) of the trigger before it checks for the next trigger (high).

If you can share the simple project, we might be able to understand the issue better and provide additional suggestions

Regards,

Meenakshi Sundaram R

View solution in original post

0 Likes
8 Replies
Yeshwanth_KT
Employee
Employee
50 replies posted 25 replies posted 10 likes received

Hello,

Can you please attach the project so that it becomes easy for debugging. Also is there a specific purpose for selecting those clock values.

Thanks,

Yeshwanth

0 Likes
Anonymous
Not applicable

Hi Yeshwanth,

Thanks for replying. I'm afraid the project is rather large and depends on custom external hardware, so posting it probably wouldn't help. I'll see if I can create a small self contained example that shows the problem.

The clock values stem from needing the M0+ running close to 100MHz, hence clk_slow and clock_peri need to be 98MHz. The UDB will not meet timing at that speed, so I divide clk_peri by 2 for some bus operations, and actually I am running the ALU's significantly lower than that. I can just about meet timing at 25MHz for the ALU's, so I aim to use clk_peri/4 eventually, but for now am running at clk_peri/10.

I expected the UDB FIFO bus nfull and nempty to be synchronised to the bus clock, and to respond pretty quickly to a write, but even if I set the DMA to only retrigger after 16 CLK_SLOW cycles, I still think I am seeing FIFO overruns when I write into the FIFO.

It's suddenly become a bit more difficult because my scope is in the shop, so my visibility has diminished for a few days.

If I manage to get an example program running, I'll post it here.

Thanks again.

Zigs

0 Likes

Hello Zigs,

Can you provide some more information regarding your FIFO configuration and DMA descriptor configuration? Were you able to monitor the fx_bus_stat signals and see how they change with DMA transfers? This will help in further finding the timing.

Are you running the FIFO in Async mode - Pg 382 in this TRM doc?

It is difficult to provide recommendations without the project

You can also refer to section 9.5.5 in TRM to see the DMA performance related specs and that might give some more idea on how long DMA takes to transfer one element.

Regards,

Meenakshi Sundaram R

0 Likes
Anonymous
Not applicable

Hi Meenakshi,

I am in the process of creating a mini project which I'm expecting will reveal the problem in more detail. When I get it working or if it reveals the issue I'll post it here.

I am trying to DMA a buffer of 512 bytes to a UDB. The UDB provides an external port which has no ability to pause, so once a transfer is started it must complete without loss of bytes. So timing is key, and basically I have to know that the DMA can run faster than the external port. I can fix the clock speed of the external port to allow for this, but I do need a certain level of performance to match the application.

So the M0+ is running at 98MHz (therefore, so is clk_peri and slow_clk). The ALU's are currently clocking at (slow_clk / 10) and data is sent out one byte per 8 clocks. So the ALU should be reading the incoming FIFO at about 0.98MHz. Very slow compared to the DMA.

I write a count of 0 -> 255 into the FIFO by DMA and when that data is later checked, I see some data missing. So the checked data may read 0, 1, 2, 3, 4, 8, 9, 10... This appears consistent with the DMA filling the FIFO but that the DMA request derived from f0_fifo_nfull (f0_bus_stat) is held active longer than it should. I assume from this that the DMA request to the bus is clocked from a different clock than the bus, but I'm not sure which clock is used. I believe that other data may also get lost, but I'm concentrating on this initial loss of data which appears consistent.

The DMA descriptor is set for an Xloop of 2 and a Yloop of 128, and I have tried the trigger input type set to "one transfer per trigger" and "one Xloop per trigger". I have also tried the trigger deactivation set to "immediate" and "Retrigger after 16 Clk_Slow cycles".

I need to sit down and calculate data rates, and re-run through section 9.5.5 (thanks for the pointer) and see if this works at a system level. However, fundamentally, I think the main issue is understanding the clocking of the fx_bus_stat signals. I have a feeling that clocking the UDB's quicker might actually help, as the flag may get cleared quicker. Once I get my mini project running, I'll try and measure the flags through a test pin.

I have tried FIFO ASYNC + ADD_SYNC = NONE, and FIFO ASYNC + ADD_SYNC = ADD, so double buffering. However, the TRM and other documents indicates that this affects the fx_blk_stat signals and not the fx_bus_stat signals. Do you have reason to believe that it affects both?

I really appreciate your thoughts. My thanks for taking the time to respond.

Kind regards

Zigs

0 Likes

Hello Zigs,

Yeap the ASYNC mode only deals with blk_stat signal and does not impact the bus_stat signal.

From your description, I do not see any issue in your configuration. I think the best way to debug this would be route the bus_stat signals out and see their behavior.

You can probably use "one transfer per trigger" and "Retrigger after 4 or 16 Clk_Slow cycles" for the code with Fx_LVL = 0 (triggers if there is at least one free space in the FIFO), as the bus_stat signal should be asserted as long as there is space in FIFO and will let you transfer bytes till it is full (ideally ). In this case, you can eliminate the Yloop overhead i.e. you can have 256 bytes as the Xloop size as your transfer is byte-by-byte.

Also, what is data element size of the DMA, Destination address register used?

Regards,

Meenakshi Sundaram R

0 Likes
Anonymous
Not applicable

Hi Meenakshi,

Firstly, a mistake on my side... I need to do DMA transfers of 512 bytes, so I actually need XLoop=2 and YLoop=256. So sadly I think I do still need the YLoop. I use XLoop=2 to give me the option to transfer two bytes per trigger request.

I now have a test program almost working, so I will post results soon. Initial tests are appearing to work, but I know it is not yet quite representative of the full application.

When the f0_nfull flag asserts, it takes 200ns to deassert (20 DMA slow_clk cycles). This seems to be irrespective of whether or not the trigger input type causes a single byte transfer or a complete XLoop (2 bytes).

I am running the DMA into a single UDB Fifo, so the width is byte, and I am transferring from a uint8_t buffer[512]. Transfer settings are BYTE and BYTE_TO_BYTE. The destination register is the UDB FIFO as extracted from the cyfitter.h file (e.g. dmatest_dp__F0_REG).

The PSoC Creator Component Author Guide talks about spokes in section 7.4.3.5. I am not sure if the transfers I have are inter-spoke or intra-spoke. The document discusses efficiencies involved, but I'm not sure how to force a transfer to be intra-spoke. Perhaps this just relates to PSoC3 and PSoC5, as this is part of the document related to the DMA Wizard which doesn't exist for the PSoC6.

If I get the simple example working, I will compare the results with my full application and will let you know.

Once again, my thanks for your time on this.

Zigs

0 Likes
Anonymous
Not applicable

Hi Meenakshi,

I think I have about as good a handle on this as I am going to get by looking at the results of my simple code, which has the DMA transferring 2 bytes per request. The following can be inferred from my tests:

a) The first transfer takes 200ns from DMA request to byte 1 being written into the FIFO to remove the DMA Request.

b) Byte 2 takes a further 120ns to be written into the FIFO.

I can see this by setting the UDB data consumption at just over one byte per 320ns. In this case I see a single monotonic DMA request per two byte transfer, and that pulse is 200ns wide (the time from request to completion of the first byte write).

If I speed this up a little so the UDB consumes one byte per 312ns, then the following happens :

a) t = 0ns: UDB extracts one byte causing the DMA request to go high starting the transfer of 2 bytes into the FIFO.

b) t=200ns: DMA request goes low showing that one byte has been written into the FIFO by the DMA.

c) t = 312ns: The UDB extracts the next byte just before the DMA writes the second byte causing DMA Req to go high.

d) t=320ns: DMA Req rapidly goes low again as the DMA writes the second byte into the FIFO. (The FIFO now has 3 bytes in it).

e) t= 624ns: The next byte is taken by the UDB and the process starts again at (a), but the runt pulse gets larger.

f) Eventually the FIFO goes empty.

My only hope to improve the performance is to find a way to improve the transfer rate of the DMA. Perhaps the intra/inter-spoke configuration might help, if it still exists?

As it stands, I think I'll have to run slightly slower than this to allow other events to occur on the bus while the DMA is ongoing.

I also don't quite understand the description of the WAIT_FOR_DEACT function in section 9.3.1 of the TRM. Setting this to 0 - Pulse Trigger, says that it doesn't wait for deactivation. Does that mean that a level on the trigger signal will cause a new trigger. I had thought so, but current tests indicate that the trigger still has to go inactive to cause a second trigger. I need to look into this more closely, but I find the wording of this section quite confusing.

I'll post more as I figure it out. It might be useful to others.

0 Likes

Zigs,

Can you post your simple project to the post?

The intra/inter spoke is not applicable for PSoC 6 devices. The way to improve DMA performance is by using higher clock and lesser dimension (i.e. a 1D 256 byte transfer is faster than a 2D 2x128 bytes transfer).

The WAIT_FOR_DEACT waits for deactivation (low - deactivation) of the trigger before it checks for the next trigger (high).

If you can share the simple project, we might be able to understand the issue better and provide additional suggestions

Regards,

Meenakshi Sundaram R

0 Likes