Skip to content
January 4, 2009 / Abe Pralle

Slag gen2: First results

After spending months rewriting the compiler, weeks on the new virtual machine, and days debugging various low-level errors, I’ve gotten everything stable enough to compile & run my prime number speed test (and consequently a good chunk of the standard library).  The results:

Gen1 virtual machine: 25.40 seconds.

Gen2 virtual machine: 11.87 seconds.

Ergo: wooooooo!

I attribute the bulk of the speed boost to using One Big Switch command to implement the VM instead of the function pointer table I was using before.

In gen1 the VM execution loop looked like this C++ pseudocode:

  void op_add_int32() { ... }
  void op_sub_int32() { ... }

  // The code is composed of pointers to functions.
  SlagOpFn code[] = { ..., op_add_int32, op_sub_int32, ... };
  for (;;)
    // We use an instruction pointer index variable to look up
    // the next function to call and then we call it.

In gen2 that loop is more like this:

   #define SLAGOP_ADD_INT32 26
   #define SLAGOP_ADD_SUB32 27
   int code[] = { ..., 26, 27, ... }
   for (;;)
     switch (code[ip++])
       case SLAGOP_ADD_INT32: ...; continue;
       case SLAGOP_SUB_INT32: ...; continue;

The gen1 approach sounded good at the time but it ends up being a lot of behind-the-scenes overhead to prepare for the function call to each op function and clean up afterwards.  Conversely the gen2 switch seems like it would take a lot of processor time to compare each opcode against the ~200 possibilities – but when you have a switch whose values are roughly zero to N, the compiler creates a jump table of [N+1] values such that spot [26] contains the address of the “case SLAGOP_ADD_INT32:” code and so on, so it’s quite fast.

Other notable enhancements in the gen2 VM:

  • Written in straight C instead of C++ to be more easily compatible with more projects (engines, games, and apps).
  • Small and efficient – much smaller footprint than the gen1 VM.
  • Uses some peephole optimization to replace certain bytecode patterns with more efficient patterns as the bytecode is generated from the execution tree code at load time.  For example, a “WRITE_LOCAL_REF #2” followed by a “READ_LOCAL_REF #2” pair is replaced with a “DUPLICATE_REF; WRITE_LOCAL_REF #2” pair.
  • Unused allocation blocks are cached and recycled for a handful of small object sizes – 16, 32, 48, and 64 bytes.  All sorts of small objects with short to medium lifespans – bullets, explosions, general particles, and the like – will now have much less of an impact on memory management performance.
  • Multiple instances of the VM are now easily possible.  Possible uses for future development include thread support and having one Slag program control and interact with another (such as an integrated debugger).

I’m still finishing everything up, but everything’s turning out great so far!



Leave a Comment
  1. Jacob Stevens / Jan 5 2009 9:29 pm

    Awesome! Great stuff! I can’t wait to start using this for Wii and iPhone.

  2. joeedh / Jan 19 2009 6:06 pm

    Cool! Sounds exciting.

  3. joeedh / Jan 24 2009 2:08 pm

    I read about something interesting the other day. It’s called direct threading. It works by speeding up the dispatch for switch statements. There’s different forms of it, but basically the idea is this: normally the compiler spits out asm code that does a range check, maps the opcode to a valid range, then does a jump to the right switch statement (or the default case).

    This can actually cause stalling in the cpu, resulting in more of an overhead then you’d think. Direct threading addresses this by removing the range check; in some forms, the opcodes are modeled to be the jump addresses themselves, so you just have the jump call. This works because you can guarantee you always get valid opcodes in the compiler and VM code.

    Also, supposedly adding a manual jump out of each case statement can improve branch prediction behavior, but I’m less sure about that.


  4. Abe / Jan 24 2009 5:48 pm

    Interesting, Joe!

    My original VM used a variant of code threading – all the opcodes were function pointers. After reading some more on direct threading now I’m exploring the idea of doing threading (direct threading, I guess) using local jumps instead of function pointers. Thanks for the idea and I’ll post my results later on!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: