Advice on attempting a parallel design

Anonymous · ‎Mar 17, 2015

Disclaimer: I apologize if this question is off-topic but I'm quite new to the field of programmable logic and would like to drawn on your collective experience to help me decide if a PSoC device is the right way to go. I don't mind putting in the work but at this stage I'm lacking intuition about what is feasible and would like to avoid running into icebergs later on.

The project I am attempting is to build a hardware debugger for a recalcitrant microprocessor system. This system has a bus with 8 data + 16 address bits, plus control signals, running at 1 or 2 MHz.

My rough design for the real-time (i.e. hardware assisted) part looks something like this:

Route the bus to the parallel inputs of four datapaths, capturing each cycle.
Save the data from each cycle into a cyclic trace buffer via DMA.
Hash the address with a CRC.
Mask the checksum and use it to perform indexed DMA into a breakpoint table.
Compare the retrieved candidate breakpoint against the real addresses, optionally halting the CPU on a match.

In addition this particular MPU cannot be halted at any point. So a state machine needs to sequence through the instruction bytes in lock-step with the MPU, based on the opcode byte. Plus a slew of other details and complications too gnarly to mention.

I don't expect a detailed review but is this the sort of thing a PSoC is suited for or will I quickly hit timing/routing limits? For the record I've managed to build a prototype, around a PSoC5LP, and have sort-of gotten the trace running.

Regards,
Johan

HeLi_263931 · ‎Mar 17, 2015

I think the PSoC5 is suited for that - doing the complicated stuff in hardware is the main purpose of the whole UDB / Datapath.

I don't think you will hit timing problems with your external CPU running at 2MHz. The main problem might be that your state machine and decoder logic might eat up the UDBs.

That said, you should look how muich of the resources you are already using (look in the generated *.rpt file). You can move over logic to the datapath (probably you already did that). There is a DFB block that is a small CPU core on its own, but uits quite difficult to use for your use case I'm afraid.

Anonymous · ‎Mar 17, 2015

Thank you for the swift response! It is reassuring to know hear I'm not entirely on the wrong track.

At the moment I'm basically only using four datapaths and a few PLDs for glue logic, and so at this stage I don't anticipate squeezing things into the 24 available UDBs to be a problem.

My main worry was running into timing and routing limits not immediately apparent, e.g. in shuffling the 32 bus signals back and forth between the parallel inputs/outputs of the UDBs. Every captured MPU cycle will require on the order of four short DMAs, and so I'm hoping that clocking the system at 20-40x the frequency will be sufficient. I have occasionally seen synthesis warnings about timing violations, but for now I'll chalk that up to not really having a clue what I'm doing.

Oh, and I'll read up on those DFBs. Initially dismissed it as some sort of DSPs unsuited for my workload but it may prove to be useful, especially if I decide to attempt real-time parallel emulation of the CPU for non-intrusive tracing.

HeLi_263931 · ‎Mar 17, 2015

You need to look carefully at the architecture TRM about the DMA. Each DMA transaction incurs a setup / start delay, you need to check whether your timing is still valid. The internal routing is quite fast, even complex designs can run with at least 16MHz.

It might be better to handle the breakpoint logic within a 16bit UDB, it should be able to compary addresses much faster.

Yes, the DFB is basically a DSP engine, but it can to arithmetics quite well, and has a small memory. But as I said, it will be difficult to use, esp. since the documentation is still sparse around it.

Advice on attempting a parallel design

Re: Advice on attempting a parallel design

Re: Advice on attempting a parallel design

Re: Advice on attempting a parallel design