Skip to content
May 29, 2008 / Abe Pralle

The power of profiling

The other day I noticed that Wii Plasmacore (WiiCore) was using an awful lot of memory just loading a super-simple Slag program. I finally did some runtime profiling on the VM – just printed out how many memory allocations were being made at various points – and found some interesting results.

Turns out that loading a Slag program took 37,000 memory allocations and about 22 MB. Interestingly enough those two figures aren’t related. The bulk of the 37,000 allocations – and a fairly small amount of corresponding memory – came from an area that was fairly innocuous in my mind: the generation of complete signature strings for all the methods (e.g. “Global::min(Int32,Int32).Int32”). I coded up a different system for the method signatures (which are only needed for hooking up native callback functions with the appropriate Slag calls) and cut the number of allocations down to 12,000.

About half of the memory usage came from allocating a default 10 MB of “fast heap” (which I quickly toned down to 2 MB for the Wii); most of the other half came from a mix of the 3-dimensional aspect call lookup table ([reference type index][aspect type index][method index]) – and from the following bug in my ArrayList class:

data = new DataType[ count * sizeof(DataType) ];

Yeah… the array size is calculating the number of bytes when it should be the number of elements. So say type “DataType” is 8 bytes per value and we wanna allocate 100 values. We need 800 bytes, but this bug would give us 6,400 bytes instead. Each array was sizeof(DataType) times bigger than necessary!!

So now I’m getting 12,000 allocations using 6.3 MB. Much better! There’s still one more thing that can be significantly improved: I notice there’s a lot of small 8, 16, 32, and 64-byte allocations that end up taking 64 bytes per allocation with the built-in memory manager – first off the Wii wants most things to be 32-byte aligned (for the benefit of Direct Memory Access circuits that bypass the CPU) so anything less than 32 bytes gets rounded up to 32. Second the memory manager use an extra 16 bytes per allocation, so of course that gets rounded up to another 32. My current project then is to work up a little memory manager that more efficiently handles small allocations.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: