Skip to content
January 25, 2009 / Abe Pralle

How not to make a virtual machine (label-based threading)

I’m a better coder than when I woke up this morning, even if I don’t have anything substantive (like an improved virtual machine) to show for it.  For example, I now know:

  • gcc lets you get the address of a local label by saying “&&labelname”.
  • gcc lets you jump to an address (“calculated goto”) stored in a void pointer by saying “goto *ptr”.
  • gcc lets you define nested C functions.
  • Visual Studio doesn’t let you do any of these things – but you can work around that with inline assembly.

It’s all Joe Eager’s fault (I kid, I kid).  Joe mentioned something about threaded code this morning, and soon enough one thing led to another and I was making proof-of-concept virtual machines to test out threaded code with local label jumps – this in contrast to my current “one big switch” architecture and my old threaded code with function pointers.

I was shocked to learn about the “&&labelname” operator and the computed gotos – less shocked when Murphy later mentioned they were gcc-specific extensions.  I thought that was gonna be the nail on the coffin for the possibility of a revamped faster architecture, but then Murphy also had the idea of using inline assembly in Visual Studio to accomplish same.

Here’s the test program I eventually came up with. It compiles in gcc or in Visual Studio and implements a simple virtual machine that performs 2 billion empty iterations before stopping:

#include <stdio.h>
#include <sys/timeb.h>
#include <sys/types.h>

#ifdef _WIN32
# include <windows.h>
#else
# include <sys/time.h>
# include <unistd.h>
#endif

double time_s()
{
#if defined(_WIN32)
  struct __timeb64 time_struct;
  long long time_ms;
  _ftime64_s( &time_struct );
  time_ms = (long long) time_struct.time;
  time_ms *= 1000;
  time_ms += time_struct.millitm;
  return (time_ms / 1000.0);
#else
  struct timeval time_struct;
  long long time_ms;
  gettimeofday( &time_struct, 0 );
  time_ms = (long long) time_struct.tv_sec;
  time_ms *= 1000;
  time_ms += (time_struct.tv_usec / 1000);
  return (time_ms / 1000.0);
#endif
}

#define HALT      0
#define SET_X     1
#define DEC_X     2
#define JUMP_X_NZ 3

#ifdef _WIN32
#  define STORE_ADDRESS(index,label) __asm lea eax, label __asm mov edx,data\
     __asm mov [edx][index * TYPE data],eax
#  define JUMP_TO_IP() { void* addr = *(ip++); __asm jmp addr }

#else
#  define STORE_ADDRESS(index,label) data[index] = &&label
#  define JUMP_TO_IP() goto **(ip++)
#endif

#define GET_OPCODES 0
#define RUN         1

void execute( void** data, int operation )
{
  int x = 0;
  void** ip;

  if (operation == GET_OPCODES) goto get_opcodes;

  ip = data;
  JUMP_TO_IP();

op_halt:
  return;

op_set_x:
  x = *((int*)(ip++));
  JUMP_TO_IP();

op_dec_x:
  --x;
  JUMP_TO_IP();

op_jump_x_nz:
  {
    void* target = *(ip++);
    if (x) ip = (void**) target;
    JUMP_TO_IP();
  }

get_opcodes:
  STORE_ADDRESS(HALT,op_halt);
  STORE_ADDRESS(SET_X,op_set_x);
  STORE_ADDRESS(DEC_X,op_dec_x);
  STORE_ADDRESS(JUMP_X_NZ,op_jump_x_nz);
}

int main()
{
  void* opcodes[10];
  void* program[20];
  double start_time, end_time;

  execute( opcodes, GET_OPCODES );

  program[0] = opcodes[SET_X];
  program[1] = (void*) 2000000000;
  program[2] = opcodes[DEC_X];
  program[3] = opcodes[JUMP_X_NZ];
  program[4] = &program[2];
  program[5] = opcodes[HALT];

  start_time = time_s();
  execute( program, RUN );
  end_time = time_s();

  printf( "Direct threaded done in %lf seconds.\n", (end_time-start_time) );
  /* 5.17 seconds */

  return 0;
}

On GCC the run time for this versus the same program as One Big Switch was a marked improvement: 5.17 seconds instead of 7.72, a 33% improvement.  My hopes were up – and then dashed again as the Visual Studio results came in: 23 seconds – 3 times slower.

The difference is that the GCC version caches the IP (instruction pointer) in a register while the VC++ version does not.    That optimization could be lost with a full-fledged virtual machine – and in any case I feel like anything I might do to the VC++ version to make it faster might ultimately be too brittle for portable and maintainable code.

So I’ll stick with the VM I have already – which is pretty much done – and get back to finishing up the last bits of Slag gen2!

About these ads

5 Comments

Leave a Comment
  1. Timothy Goya / Jan 25 2009 3:03 pm

    How does the fastcall calling convention compare with normal calls compare in gen1?

  2. Abe Pralle / Jan 25 2009 3:11 pm

    I don’t follow…?

  3. Abe Pralle / Jan 25 2009 3:13 pm

    Oh, versus function pointers? The “one big switch” has already proven to be twice as fast as gen1′s function pointer dispatch, so this program would probably run at around 14 seconds on my machine if implemented with function pointers.

  4. Timothy Goya / Jan 25 2009 4:29 pm

    To clarify, I thought that gen1 already uses a function pointer table and indexed into it using the opcode. I was wondering how using the compiler-dependent fastcall calling convention (http://blogs.msdn.com/oldnewthing/archive/2004/01/02/47184.aspx) would affect performance. There’s a very good possibility that using fastcall is actually slower.

    I wasn’t thinking about using function pointers instead of computed goto, but that info is pretty useful anyway. 14 seconds versus 11 seconds doesn’t seem that bad of a tradeoff to sacrifice a bit of speed for maintainability.

    Also, what about JITing? What kind of performance gains could that bring?

  5. Abe Pralle / Jan 26 2009 1:00 am

    So fastcall is a convention that passes args in registers instead of on the stack. That wouldn’t impact my gen1 VM because I don’t pass any args there – the VM state is stored in global variables.

    Looking back over it, when I say 14 seconds I mean 16 seconds, and that’s 16 seconds versus 8 seconds, not 16 versus 11 (there is no 11!).

    Dynamic compilation (JIT) would be very fast indeed (10x faster than One Big Switch) but that’s a project for another year and a larger team… I’ll stick with my cross-compile to C to get the same boost!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: