Bytecode experiments

I have another undergraduate starting a project this week. Originally he was to look at replacing the spineless G-machine bytecode used in nhc98 with a three-address instruction code. That is still the plan, only now he might use yhc as a platform instead.

The idea is simple. Standard G-machine instructions use the stack heavily as temporary storage. But this is potentially bad for performance, because of all the memory accesses. Using registers for temporary storage would potentially be much faster. This is what native-code compilers do of course. But I would like to continue to have a portable bytecode, running on a small interpretive virtual machine. So part of the problem will be to map temporaries to real registers in the VM itself.
However I expect that to be relatively straightforward, and the major work will be defining and generating a register-based bytecode. I'm thinking of translating the already-generated G-code to register code by simulating the stack.

Ultimately, I'm hoping for a decent speed-up, maybe 2x or better. But I guess there are quite a few pitfalls that could mean we only an improvement of say 20%. Or maybe it will go spectacularly well, and we'll get 3x improvement. We'll find out in a few months time.

There was a rough plan to put a bytecode API together - so programs can read in .hbc [haskell byte code] files and play with them. This way the project could remain entirely outside the source code to the compiler (which is probably a good thing for managability from their point of view).
This comment has been removed by a blog administrator.
It's a good idea but I think there are a lot of issues.

The biggest issue is that you can't take the address of registers or treat them as an array. Thus you have to do something like:

switch (code){
case MOVE_0_1: reg1 = reg0;
case MOVE_0_2: reg2 = reg0;
case MOVE_0_3: reg3 = reg0;

(which might result in rather a lot of code = poor cache performance) or alternatively

switch (code){
case MOVE:
switch (arg[0]){
case 0:
switch (arg[1]){
case 1: reg1 = reg0;
case 2: reg2 = reg0;

(which will be slowed down by all the jumps from the switches).

In both cases you get the problem of
'how many registers do you have available?'. There are lots of registers on a PPC and almost none on an x86.

A related issue is how many registers should be given the C compiler to use the runtime, and how many should be available to the interpretter?

A further problem is that register based instructions are larger and more complex than stack based ones (because stack machines use implicit arguments). In an intepretive framework this is crucial since a lot of the time spent is in decoding and dispatching instructions.

My guess would be that a better approach to getting registers involved is what the O-Caml virtual machine does which is to cache the top value of the stack in a register (see http://pauillac.inria.fr/~lebotlan/docaml_html/english/index.html)
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?