Register-based bytecode

I blogged a few months ago about a student who was looking at turning the standard stack-based bytecode into a register-based variant. Well, now the results are in, and they look pretty good. Just to recap, the plan was to take Yhc's backend bytecode (based on the spineless G-machine); to define a new bytecode that implements the same operations, using machine registers instead of a stack for temporary storage; to translate from one bytecode format to the other (including a register allocator); and to implement the corresponding runtime interpreter to execute the bytecodes.

The implementation proceeded in stages, at first using an array to represent a set of registers, then later using a cunning despatch mechanism to get hold of real registers. At this point, performance of the new system was about the same as that of the old system. But the cool thing is that register-based code allows more peephole-optimisation
opportunities. You can re-order instructions, provided they do not touch the same registers, and this code motion brings together pairs of instructions which can cancel each other out, or be fused into a different instruction that is cheaper. Alex also did some extensive profiling to discover common instruction/instruction pairings, and instruction/register pairings, and created specialised versions of those recurring operations. (This kind of profiling had already been done on the original nhc98/yhc stack-based bytecode, gaining about 25% in speed. The naive translation to register code actually threw away all that optimisation work, aiming initially for a simple, regular instruction set.)

So with some optimisation phases added, the register-based bytecode now runs up to 2x faster than stack-based code. There are more potential optimisations waiting in the wings too, although Alex will not have time to implement these.

One of the interesting negative results is that the first translation scheme ended up making the bytecode size larger (about twice the size on average per program), but that this made the "hello world" program 7x slower! Since yhc loads the program code dynamically at runtime, what this means is that there is a non-linear relationship between file size and loading time. But this is only really a concern for very short-running programs: above 1 second, the gains in running time outweigh the losses at loading time.

This page is powered by Blogger. Isn't yours?