Actually I haven't had much time today , either, but I have finished the CP1610 version of the URM Interpreter - the advantage of a RISC type machine which only has two instruction formats is that it's fairly easy to code - 100 lines of C or Java, 2-300 of Assembler.
Initial tests of the most simple instructions (mov r0,#0) suggests it executes in 26 machine instruction cycles, about 1,000 instructions per second.
The slowest (outside things like delay instructions !) would be addpl 1(r0),2(r0). This is (in C) if (lastResult >= 0) Memory[Memory+1] = Memory[Memory+2] with tracking of carry, sign and zero of the result.
I haven't single stepped it but on current code takes 15 more instructions to evaluate the pl condition, 10 more instructions to calculate the RHS (2(r0)) and 8 more instructions to calculate the indexed indirection on the left hand side, i.e. 33 instructions, about 450 instructions per second. Though this is an unusual case, the worst case scenario on everything.
Branch/Branch Link and their psuedo operations ret and sys are quicker because the address decoding is much simpler.