Skip to content
September 2, 2008 / Abe Pralle

ETC-to-C cross-compiler

I just finished the major work on the ETC-to-C cross-compiler I’ve been working on!

ETC (Execution-Tree Code) is the bytecode that Slag compiles to.  A couple of months ago I made an initial quick ‘n’ dirty cross-compiler that converted ETC code into “Hybrid VM” code, where the organization was the essentially the same as in the VM and the only difference was that the commands that had been called via function pointers were now listed out in contiguous blocks.

As I’m now working on Doc Fuzzle for the iPhone, I needed to improve my cross-compiler.  I had three goals:

  1. First and foremost, to generate straight C since the iPhone (and Mac in general) is set up to work with C and Objective-C but not C++.
  2. To remove the symbiotic dependence on the VM architecture in order to slim down the final code size.
  3. To change my “assembly-style” stack-based, granular code generation to use regular C parameters and operators instead of always having to pop, combine, and push values during operations.  Partly to boost speed and partly to reduce the size of the generated source code (which was hundreds of thousands of lines).

For the most part, targeting C over C++ was not a big deal in itself.  Most things in the Hybrid VM compiler were already C-style – for example, “st.substring(5)” cross-compiles to “ClassString__substring(st,5)”.  The implementation challenges fell into three other categories:

  1. Keeping track of references for garbage collection.  Whenever the fast heap runs out of memory a collection is triggered; no matter what the program was doing, there has to be a way to trace through all of the local references on the call stack.  I can’t just use a separate reference stack, since to set it up before a call I’d have to go back to evaluating references piecemeal – which would necessitate a value stack again as well.  Also, I couldn’t have local variables be references at all since a garbage collection can change the memory address of an object – with a call like “rock(r1,NewObj(),r2)”, either r1 or r2 is going to already be evaluated and pushed on the OS call stack by the time that NewObj() is called – if it triggers a data collection, r1 (parameter) is going to end up being different (not to mention invalid) from r1 (local var).My solution was to pass pointers to pointers to objects and have a separate “backup” reference stack instead.  So with “rock(r1,r2)”, the references are pointers to pointer to objects.  In addition, at the beginning of the rock() method, both *r1 and *r2 are pushed on the separate stack to make sure they stay visible if a GC happens.  Seems to work!
  2. Working up a replacement for try/catch, since C doesn’t have that.  Murphy mentioned setjmp and longjmp (which I hadn’t heard of before) and so I was able to implement try/catch using those.  setjmp is kinda like the fork command – when it returns it’s either just set up a save point for the first time or it’s just rewound the program back to that save point because of a longjmp call that happened later on.  You can tell what’s going on based on the return value.
  3. Body of work.  Turns out there was just a lot of code in the VM, and extracting it and converting it to C – regardless of how straight-forward it was – took quite a while!

My one speed test so far has been in the form of a modified prime number-finding program.  It’s mostly iteration, but every outer iteration causes several inner iterator objects to be created and it saves the results in an ArrayList of ever-increasing size.  It tests 100k numbers for prime-ness.  Here’s the Slag version:

augment Global
    method is_prime( Int32 num ).Boolean:

      forEach (divisor in {2,3..sqrt(num) step 2})
        if (num % divisor == 0)
          return false

      return true

class Test
    method init:
      local var low  = 100000000
      local var high = 100100000

      local var start_ms = time_ms
      println( "Prime numbers in the range $..$" (low,high) )
      forEach (prime in find_primes(low,high))
        #println( "  $(3) is a prime number." (prime) )
      local var elapsed_ms = (time_ms - start_ms)
      println( "$(.2) seconds" (elapsed_ms/1000.0) )

    method find_primes( Int32 low, Int32 high ).Int32[]:
      local Int32[] primes()
      forEach (n in low..high)
        if (n.is_prime) primes.add( n )
      return primes

And here’s the Java equivalent that works about the same:

import java.util.*;

public class Primes
  static public class RangeIterator implements Iterator<Integer>
    int cur, last, step;

    public RangeIterator( int f, int l, int s )
      cur = f;
      last = l;
      step = s;

    public boolean hasNext()
      return cur <= last;

    public Integer next()
      int result = cur;
      cur += step;
      return result;

    public void remove() {}

  static public class IndirectIterator implements Iterator<Integer>
    RangeIterator cur;
    List<RangeIterator> pending = new ArrayList<RangeIterator>();

    public IndirectIterator( RangeIterator r1, RangeIterator r2 )
      add( r1 );
      add( r2 );

    public void add( RangeIterator iterator )
      if (cur == null) cur = iterator;
      else pending.add( iterator );

    public void adjust_cur()
      if (cur == null) return;
      while ( !cur.hasNext() )
        if (pending.size() == 0)
          cur = null;

        cur = pending.get(0);

    public boolean hasNext()
      return (cur != null);

    public Integer next()

    public void remove(){}

  static public boolean is_prime( int num )
    IndirectIterator itor = new IndirectIterator( new RangeIterator(2,2,1),
        new RangeIterator(3,(int)Math.sqrt(num),2));
    while (itor.hasNext())
      int divisor =;
      if (num % divisor == 0) return false;
    return true;

  static public void main( String[] args )
    new Primes();

  public Primes()
    int low  = 100000000;
    int high = 100100000;

    long start_ms = System.currentTimeMillis();
    System.out.println( "Prime numbers in the range " + low + ".." + high );
    for (int prime : find_primes(low,high))
    long elapsed_ms = System.currentTimeMillis();
    System.out.println( (elapsed_ms/1000.0) + " seconds" );

  public List<Integer> find_primes( int low, int high )
    List<Integer> primes = new ArrayList<Integer>();
    RangeIterator itor = new RangeIterator(low,high,1);
    while (itor.hasNext())
      int n =;
      if (is_prime(n)) primes.add(n);
    return primes;

The results:

  • ETC VM:  26 seconds
  • ETC to Hybrid-VM:  3 seconds
  • ETC to C: 1.35 seconds
  • Java:   1.22 seconds

I’m definitely pleased that the new cross-compiler looks to be 50% faster than before, but I’m also a little surprised that I didn’t beat out Java. Ah well, that’ll give me something to think about, and in the meantime I’m happy to be able to turn Slag programs into fast and portable C!

For your interest, here’s the cross-compiled main (init) method of the Slag program with the original lines as Slag-style comments:

# method init:
static ETCRef* ClassTest__init( ETCRef* _local_this )
  // Test::init() [4]
  ETCRef* ref_frame_ptr;
  ETCRef* local_this = ref_stack_ptr++;
  ETCInt32 local_var_1 = {0};
  ETCInt32 local_var_2 = {0};
  ETCInt64 local_var_3 = {0};
  ETCInt32 local_var_4 = {0};
  ETCRef* local_var_5 = ref_stack_ptr++;
  ETCInt64 local_var_6 = {0};
  *local_this = *_local_this;
  *local_var_5 = NULL;

  ref_frame_ptr = ref_stack_ptr;
  # local var low  = 100000000
  # local var high = 100100000
  # local var start_ms = time_ms
  local_var_1 = ((ETCInt32)100000000);
  local_var_2 = ((ETCInt32)100100000);
  local_var_3 = ClassGlobal__time_ms();

  #println( "Prime numbers in the range $..$" (low,high) )
  ref_stack_ptr = ref_frame_ptr;

  # forEach (prime in find_primes(low,high))
  *local_var_5 = *(AspectList_of_Int32____create_reader__dispatch(
  ref_stack_ptr = ref_frame_ptr;
  while (AspectReader_of_Int32____has_another__dispatch(NULL_CHECK(local_var_5)))
    local_var_4 = AspectReader_of_Int32____read__dispatch(NULL_CHECK(local_var_5));

  # endForEach

  # local var elapsed_ms = (time_ms - start_ms)
  local_var_6 = (ClassGlobal__time_ms() - local_var_3);

  # println( "$(.2) seconds" (elapsed_ms/1000.0) )
    NULL_CHECK(ClassGlobal__format_string_23((((ETCReal64)local_var_6) / 
    ((ETCInt32)' '))),&etc_string_table[12]));
  ref_stack_ptr = ref_frame_ptr;

  // [unwind stack frame - note that compiled init() methods are altered
  // to return their "this" object context as a return value.]
    ETCRef* result = local_this;
    ref_stack_ptr = ref_frame_ptr - 2;
    *ref_stack_ptr = *result;
    return ref_stack_ptr++;


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: