- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
in preparation of migrating some custom components from Verilog to UDB ALU function block, I used the parallel PI/PO example of AN82156 as a base. This example is a 8-bit adder, where one of the summands comes from the D0 register, written by CPU, the other summand comes from UDB parallel input. The example component uses dynamic switching between ALU input of A0/A1 or parallel input PI and uses two states: fetch PI & add.
AN82156 uses the state diagram below:
The functional Verilog code:
STATE_LOAD:
begin
state <= STATE_ADD;
/*we must lacth the PO value here, because in the next state PO is not valid*/
Parallel_Out <= po;
end
STATE_ADD:
begin
state <= STATE_LOAD;
end
So, from the above I'd expect the component needs two clock cycles to get the result ready. However, it seems it takes three clock cycles until the result is ready. Can anyone explain why three clock pulses are needed?
From the datapath description, the "Ax WR source" description explicitely states that the Ax register is written _after_ the ALU operation has completed, so it can't be expected that any ALU operation is finished immediately when the clock rising edge occurs. But from the documentation, I'd expect that result of an ALU instruction is ready when the next clock rising edge occurs and therefore, the above should only need two clock cycles. Where am I wrong? Does each ALU operation need two clock cycles until ready, but the next operation is started with the next clock edge (interleaved operation)?
Regards
Solved! Go to Solution.
- Labels:
-
PSoC 5 Architecture
-
PSoC 5LP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello BharadhwajaS_91,
I've made some additional tests. My inital statement is partially wrong: the component doesn't need three clock cycles in general, but the result is ready with an offset of one clock pulse. I wrote a small test program to verify, the flow is as follows:
1) set parallel input value, starting at 0x00 and incrementing for each run, D0 (Add_Value remains at 0x02 as prepared in the example)
2) force a clock pulse //transition from state = load to state = add
3) read the output
4) force clock pulse //transition from state = add to state = load
5) read the output
6) back to #1
The output:
First clock pulse, input 0, result 0 //after state = load
Second clock pulse, input 0, result 0 //after state = add (here's the point where I'd expect result = input + 2)
First clock pulse, input 1, result 2 //next run, input incremented, but result from previous run => offset of one clock cycle
Second clock pulse, input 1, result 2
First clock pulse, input 2, result 3
Second clock pulse, input 2, result 3
First clock pulse, input 3, result 4
...
So, I modified the parallel adder UDB implementation:
1) the dynamic reg 0 configuration is changed to ADD, SRCA = A0, SRCB = D0, A0_WR_SRC = ALU, A1_WR_SOURCE = NONE
2) the dynamic reg 0 configuration is copied into reg 1, they're now both the same
3) State_Add is modified, now also latching of PO
4) added a ready output signal, assigned the inverted value of the state machine
The output:
First clock pulse, input 0, result 0
Second clock pulse, input 0, result 2 //output ready on the second clock pulse as expected
First clock pulse, input 1, result 0
Second clock pulse, input 1, result 3
First clock pulse, input 2, result 0
Second clock pulse, input 2, result 4
First clock pulse, input 3, result 0
...
Note that the result is read with a status register with sticky configuration, clock input is the ready signal. This is for verification that the result changes with the rising edge of the ready signal (that's why the result of the first clock pulse is always zero).
So, it seems that this modification improves the parallel adder functionality. I assume(!) the PI/PO example uses the two different state configurations for simplification, but the clock offset was not taken into account.
The above modification are quick'n'dirty, but this was really helpful in learning UDB. Next steps are to extend the adder to get both summands by parallel input. Not sure if this can also be done in two cycles
Regards