A Neat Variable-Byte Encoding for Integers
Bard bytecode files are stored using variable-length integers – since most of the opcodes and string lengths, etc., are seven bits or less, it saves a lot of space overall. In the past I used a system that was roughly UTF-8-ish in design but with one special modification: since negative one is a fairly common index and operand, I added one to my integer value before encoding it so that numbers -1..126 would be stored as a single byte (and a leading ’1′ bit would indicate a multi-byte value).
I thought about that design again today and for whatever reason a much better system came to mind. Here’s the spec:
Leading byte: 01111111 // 0 - 127 1000xxxx // 1 byte follows - 12 bits total 1001xxxx // 2 bytes follow - 20 bits total 1010xxxx // 3 bytes follow - 28 bits total 10110000 // 4 bytes follow - 32 bits total 11xxxxxx // a negative value -64..-1, e.g: 11000000 // -64 11111111 // -1
The cool thing is that values −64..127 can be simply truncated to a byte with no other encoding processing required! Not only that, but 15 bit patterns 10110001 through 10111111 are still available any other special markers you might need.
On an unrelated note – I got majorly busy on a big contract project just after I posted my “One Game a Week” plan, so that’s on hold.
Advertisement
