Each SPI slave needs an ss-line which you can assert programatically. No need to use the ss-line from the SPI component. That way you are limited only to the number of pins available.
You are not "grabbing" the data from the sensors: You are sending dummy bytes and for each byte you get one byte back. So no need for a circular buffer. The problem might be that you seemingly have got more than 8 sensors which you will have to access one after the other. So the collecting of the data is the first bottleneck. Way out could be to use more than only one SPI master, thus reducing the time required.
UART transmission is not a problem. The component can be configured to have a larger buffer from which automatically (and interrupt driven) the data is fetched. Only when the buffer is full, the writing to the UART will block until there is room in the buffer again.
If I had say 8 of the same identical sensor, is there a way to send the same clock, ss, and dummy bytes to all devices simultaneously and read back all 8 sets of data at the same time? Presumably this might require writing some verilog?
Presumably this might require writing some verilog Yes!