Skip to content
November 11, 2011 / Abe Pralle

A Neat Variable-Byte Encoding for Integers

Bard bytecode files are stored using variable-length integers – since most of the opcodes and string lengths, etc., are seven bits or less, it saves a lot of space overall.  In the past I used a system that was roughly UTF-8-ish in design but with one special modification: since negative one is a fairly common index and operand, I added one to my integer value before encoding it so that numbers -1..126 would be stored as a single byte (and a leading ‘1’ bit would indicate a multi-byte value).

I thought about that design again today and for whatever reason a much better system came to mind.  Here’s the spec:

Leading byte:
  01111111 // 0 - 127
  1000xxxx // 1 byte follows - 12 bits total
  1001xxxx // 2 bytes follow - 20 bits total
  1010xxxx // 3 bytes follow - 28 bits total
  10110000 // 4 bytes follow - 32 bits total
  11xxxxxx // a negative value -64..-1, e.g:
  11000000 // -64
  11111111 // -1
The cool thing is that values −64..127 can be simply truncated to a byte with no other encoding processing required!  Not only that, but 15 bit patterns 10110001 through 10111111 are still available any other special markers you might need.
On an unrelated note – I got majorly busy on a big contract project just after I posted my “One Game a Week” plan, so that’s on hold.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: