Skip to content
May 6, 2009 / Abe Pralle

iPhone drawing speed

I’ve been doing some initial drawing tests on the iPhone with so-so results.

Turns out it uses a graphics chip that renders different tile sections of the screen in parallel.  I couldn’t find any info on how big the tiles are offhand.

Here are my very informal results. [Clarification: I’m drawing tile/sprite images as a series of individual calls, each with its own 2D transform]

In 1/60th of a second, the iPhone can draw about:
24 textured polygons in the same spot, quarter-screen size or less.
OR 75 spaced-out polys (32×32 in this case)
OR 6 full-screen polys.

Notes:
– This is with a tight drawing-only loop.  Logic & sound adds a bit of overhead – not an excessive amount but then again every bit hurts.
– Things that didn’t significantly improve the speed: turning off texturing, turning off back-buffer clear, turning off alpha blending, using texture formats with fewer bytes per pixel.

In light of this I’m gonna tweak Plasmacore for iPhone so that it’s CYOB (Clear Your Own Backbuffer).  Not to save time clearing the backbuffer, but to allow you to get tricky (if you want) and only redraw altered portions of the screen.  The good news is that Plasmacore (as usual) will still update() at 60fps no matter how slow the draw() is going, so things might get a little rougher but they won’t get slower.

A final tip will be to avoid drawing font-based text – use prerendered words!

Advertisements

15 Comments

Leave a Comment
  1. Jacob Stevens / May 6 2009 10:39 pm

    Interesting. Are those polys being rendered through slag, or purely through native code?

  2. Jacob Stevens / May 6 2009 11:04 pm

    This guy has a tutorial and is claming 310,000 polys per second, or over 5,100 per frame. Is he taking a totally different approach?

    http://www.sunsetlakesoftware.com/2009/01/13/opengl-es-catransform3d

  3. Abe Pralle / May 7 2009 12:12 am

    Through cross-compiled Slag.

    So the 5166 per frame is 2583 quadrangles, which is 34x more than I’m getting.

    I couldn’t really tell what that guy was drawing (it looked like 2 quadrangles in the screenshot). I’ll guess for his speed tests he’s probably drawing a number of big triangle-strip meshes with some overhead for every few dozen or hundred triangles whereas I’m drawing individual tile/sprite images with more overhead for every 2 triangles, including various 2×3 and 2×1 2D matrix multiplications per image on the C side for a total of 40 multiplications and 32 additions per image drawn (plus another transform internal to OpenGL).

    Just “manually” calculating a transform with a handle, a screen center, a rotation, and a scale is 24 multiplications and 32 additions per image, and that’s without the extra transform I’m currently using to bypass OpenGL ES’s floating point roundoff error I mentioned before.

    I’ll investigate how much relative time the transformations take and see if there would be some real benefits to having different drawing calls for unrotated, unscaled images. Maybe even see what Core Animation offers directly, without going through OpenGL…

    • Abe Pralle / May 7 2009 12:24 am

      Oh – and looks like that guy’s optimizations are to maintain his own model-view matrices for easier access rather than having to ask OpenGL, which was a bottleneck I guess. I’m already keeping my own matrices anyways!

    • Brad Larson / May 8 2009 5:17 pm

      I’m the author of the above-linked post. If you’d like more detail as to how I arrived at the numbers I quote in that post, I’d recommend reading my original article on OpenGL ES on the iPhone at

      http://www.sunsetlakesoftware.com/2008/08/05/lessons-molecules-opengl-es

      Basically, I counted the number of triangles within the molecular models I was pushing to the screen, ran a loop where I rotated the model by a degree each frame, and found how long it took to render 100 frames. The triangles in this example were colored, with no textures applied, and lighted using smooth shading.

      If you care to repeat the performance test, you can download the source code to Molecules at

      http://www.sunsetlakesoftware.com/molecules

      and uncomment the “#define RUN_OPENGL_BENCHMARKS” line. It will run this benchmark for each molecular model you load and dump the results to the console.

      I do use a vertex buffer object for storage of my vertices (as I explain in the post above) because I found that constantly sending geometry to the GPU was a severe bottleneck. You can observe this by running Instruments on your rendering test and see where your application spends most of its time. It was especially useful in my case, because the molecular structures only need to be loaded into memory once and don’t change during the rendering operations.

      • Abe Pralle / May 8 2009 7:25 pm

        Cool, thanks Brad!

  4. Jacob Stevens / May 7 2009 7:58 am

    Looking around on various message boards, one consisent piece of advice I hear is to use a single vertex buffer, a large texture containing every piece of art. Then you can use a single draw call to render the scene.

    Is there any easy way to test if that’s faster? Obviously that would require some rework and would have significant limitations, but it might be worth it.

    Here are some interesting message threads on the subject:

    http://www.iphonedevsdk.com/forum/iphone-sdk-game-development/10129-opengl-2d-sprite-performance.html

    http://stackoverflow.com/questions/421969/performance-and-background-images-for-opengl-es-iphone

    I’m thinking we should be able to push the sprite count into at least the hundreds. Field Runners, iDracula, and Zombieville USA seem to be pushing a pretty sizable number of objects.

    Also, a library called cocos2d-iphone seems to be pretty popular. It has a bunch of features we don’t need, and I imagine it’s built on top of OpenGL ES anyway, but it might help to peek at the source code:

    http://code.google.com/p/cocos2d-iphone/

    • Abe Pralle / May 7 2009 10:27 am

      Yeah, the texture buffer + texture atlas (“single texture sheet”, anyone?) plus one draw call sounds like it might do the trick. iPhone max texture size is 1024×1024, so that gives us room for 1024 32×32 images (or 256 64×64 images) at once – should be plenty.

      In addition, I’ve realized that it’d be super-simple to have a “TileMap” with a single transform that applies to the shared vertices of hundreds of orthagonal tile images inside.

      I’ll pump that speed up!!

      • Jacob Stevens / May 7 2009 10:43 am

        Right on!

        If the single texture method improves performance significantly, what are you thinking for the API? Part of me thinks it would be cool if “it just works” and plasmacore automatically makes supertextures for you and combines the draw calls transparently.

        The other part of me thinks that ultimately it might be nice to have more control over the process, so we can have some flexibility with blend modes and texture coordinate tricks if need be. Something like:

        start_draw_buffer(superTexture, blendmode)

        add_quad(vertices, texturecoords)
        add_quad(vertices, texturecoords)
        add_triangle(vertices, texturecoords)

        end_buffer()

        What are your thoughs?

        Jacob

    • MooCow / Sep 26 2010 5:06 am

      Field Runners, iDracula, and Zombieville USA seem to be pushing a pretty sizable number of objects

      * I am sure the sprites will be transformed in the CPU, written to a vertex buffer and sent to the GPU in one call where possible.

      • Abe Pralle / Sep 26 2010 11:20 am

        Yeah… since this original post, I’ve found that minimizing the amount of texture switching (by compiling images into large texture sheets) gives the significant speed boost we needed. Plus most people are using faster iPhones (3GS/4) than the original iPhone I wrote this post about.

  5. Abe Pralle / May 10 2009 1:19 pm

    @Jacob: I think we’ll be able to have the best of both worlds. I’m gonna start work on a an “ImageCache” that returns an Image given a filename or a Bitmap and manages the texture sheet. But you can still manipulate the relative uv coordinates of the individual images – you should be able to do anything you could do before!

  6. Jonathan / Nov 13 2009 5:50 pm

    Did you ever implemented the “clear your own back buffer”? I’m having a problem where I actually NEED to clear my own back buffer (I want to draw what changed), but it seems you can’t rely on the buffers contents to stay the same on the iPhone.

    There’s an option to set a state so it retains the backing, but I’m having some strange errors when I attempt to that. Part of the image isn’t drawn (it’s an image built up using points — this is a 2D App).

    Were you able to control the clearing of the back-buffer without using the retainedbacking option?

    • Abe Pralle / Nov 13 2009 7:04 pm

      The option is in my framework, but I’ve never really tested it – I always clear the backbuffer.

      BTW messing with android these days I’ve found that the gl es “draw texture” extension is MUCH faster than texture triangles for square orthogonal images. Haven’t tried getting alpha blending working with it yet – not sure if that’s supported. But here’s the code to do it (again: on Android – gonna try popping it on iPhone soon):

      GLint src_rect[4];
      src_rect[0] = (GLint) (uv_a.x * texture_width); // left
      src_rect[1] = (GLint) (uv_b.y * texture_height); // bottom
      src_rect[2] = (GLint) ((uv_b.x – uv_a.x) * texture_width); // width
      src_rect[3] = (GLint) -((uv_b.y – uv_a.y) * texture_height); // height

      // these must be disabled
      glDisableClientState(GL_COLOR_ARRAY);
      glDisableClientState(GL_VERTEX_ARRAY);
      glDisableClientState(GL_TEXTURE_COORD_ARRAY);

      glActiveTexture( GL_TEXTURE0 );
      glEnable(GL_TEXTURE_2D);
      glBindTexture( GL_TEXTURE_2D, texture_id );

      glTexParameteriv( GL_TEXTURE_2D, GL_TEXTURE_CROP_RECT_OES, src_rect );

      double y_pos = (app_info.display_height – size.y) – pos.y;
      glDrawTexfOES( (GLfloat) pos.x, (GLfloat)y_pos, 1.0f, (GLfloat) size.x, (GLfloat) size.y );
      glDisable(GL_TEXTURE_2D);

      • Jonathan / Nov 14 2009 10:46 am

        Awesome.. I’ll give that a shot and let you know how it goes.

        I know it’s not the most efficient way, but it’s essentially a 2D surface that I render to, so I just do GL_POINTS in an interleaved array to draw the pieces of the screen that changed. I do rely on the contents being retained between flips because it’s just a small part that I update.

        For some reason when it has to draw the entire screen it seems that Open GL just doesn’t display some points. It’s always the same ones. I even tried breaking the screen into tiles and always drawing a full tile (if it’s dirty) and for some reason OpenGL STILL isn’t rendering in some spots (it draws a partial square). I only see this behavior when I use the retained backing.

        I’ll give using textures a shot, though, and see if it a) works correctly, and b) doesn’t have the strange problems I’m seeing with GL_POINTS.

        Thanks again!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: