These LUTs get compiled into simple AND/OR/NOT gates, into the marco cells of the UDBs. Each MC can hold two output signals. Depending on the actual logical equations it differs how the logic can be distributed - so it might be that one UDB can hold only one signal and not two. This might leads to resource exhaustion.
When you are implementing a state machine in your LUT, you can try to implement them manually. (One DFF per state, a common clock, and logic for each DFF to set its state). This _might_ be more effective, but its not guaranteed.
I'm getting the same error, but with identical logic drawn a little differently. This design routes fine with 40.4% od UDBs:
[See Attachment 1]
But this design will not route due to more than 24 UDBs used.
[See Attachment 2]
Obviously the optimization fails a bit. You could do better using 4 LUTs.