I think the PSoC5 is suited for that - doing the complicated stuff in hardware is the main purpose of the whole UDB / Datapath.
I don't think you will hit timing problems with your external CPU running at 2MHz. The main problem might be that your state machine and decoder logic might eat up the UDBs.
That said, you should look how muich of the resources you are already using (look in the generated *.rpt file). You can move over logic to the datapath (probably you already did that). There is a DFB block that is a small CPU core on its own, but uits quite difficult to use for your use case I'm afraid.
Thank you for the swift response! It is reassuring to know hear I'm not entirely on the wrong track.
At the moment I'm basically only using four datapaths and a few PLDs for glue logic, and so at this stage I don't anticipate squeezing things into the 24 available UDBs to be a problem.
My main worry was running into timing and routing limits not immediately apparent, e.g. in shuffling the 32 bus signals back and forth between the parallel inputs/outputs of the UDBs. Every captured MPU cycle will require on the order of four short DMAs, and so I'm hoping that clocking the system at 20-40x the frequency will be sufficient. I have occasionally seen synthesis warnings about timing violations, but for now I'll chalk that up to not really having a clue what I'm doing.
Oh, and I'll read up on those DFBs. Initially dismissed it as some sort of DSPs unsuited for my workload but it may prove to be useful, especially if I decide to attempt real-time parallel emulation of the CPU for non-intrusive tracing.
You need to look carefully at the architecture TRM about the DMA. Each DMA transaction incurs a setup / start delay, you need to check whether your timing is still valid. The internal routing is quite fast, even complex designs can run with at least 16MHz.
It might be better to handle the breakpoint logic within a 16bit UDB, it should be able to compary addresses much faster.
Yes, the DFB is basically a DSP engine, but it can to arithmetics quite well, and has a small memory. But as I said, it will be difficult to use, esp. since the documentation is still sparse around it.