Skip to content
May 30, 2008 / Abe Pralle

Low-level coding part 1: Pointers

Introduction

So you wanna be a low-level programmer?

When I was starting out programming I was in awe of low-level coders and definitely felt insecure about my own potential in that area . While I felt I could sort of feel my way through high-level code, the requisite information and skills for low-level coding (assembly, OS API, low-level C/C++, etc.) just seemed so esoteric and random – I didn’t know how anyone could figure that out.

20 years later I can say… well, low-level coding IS esoteric and random! It’s just a big grab bag of random weird knowledge. The important thing I realized though is that nobody just uses their intuition to break into it. You wanna program in a new operating system or on a new kind of processor? You have to just get an instruction manual.

Then you struggle through some concepts for a while – the biggest problem being you don’t know what you don’t know – you hack together some simple test programs to verify things are working like you think. Then you write something larger in scope – but it doesn’t work! So you start debugging… experience helps, good intuition helps, but sometimes all the experience in the world won’t help you from getting stuck on something for days. Then it feels so good when you finally figure it out… it’s usually either something you didn’t understand properly after all or just a logical error that you overlooked.

Both in low-level programming and with coding in general, the fundamental thing that separates the men from the boys is having the grit to really go track down those elusive, sometimes subtle bugs. As Steve Pavilina recommends, make the decision to develop Zero Defect software – that doesn’t mean there’s no flaws (which is impossible to guarantee), it just means there’s no flaws you know about.

With all that in mind, let’s get our hands dirty by looking at pointers in C++. Note: you should already be comfortable with high-level programming. Also: you should be able to do all this with the Mac in XCode, but I don’t know the details.

Exercise 1: Install Visual Studio .NET Express 2008

http://www.microsoft.com/express/download/#webInstall

VS 2005 is fine too – that’s what I’m using.

Exercise 2: Hello World

Create a test project (Win32 Console Application, Not using precompiled headers) containing the following code:

#include <iostream>
using namespace std;

int main()
{
    cout << "Hello World!\n";

    cin.get(); // wait for a cursor return before ending
    return 0;
}

Compile it (F7) and run it (F5) and ensure that a window pops up with “Hello World” in it.

Exercise 3: Pointers

Now try the following:

#include <iostream>
using namespace std;
int main()
{
  int num = 5;  // num is a regular integer
  cout << "Using 'num'\n";
  cout << "Address of num: " << &num << endl;
  cout << "Memory location " << &num << " contains " << num << endl;
  int* ptr;  // ptr is a pointer to an integer
  ptr = &num;
  cout << "\nUsing 'ptr'\n";
  cout << "Address of ptr: " << &ptr << endl;
  cout << "Memory location " << &ptr << " contains " << ptr << endl;
  cout << "Memory location " << ptr << " contains " << *ptr << endl;
  num++;
  (*ptr)++;
  cout << "\nAfter changes\n";
  cout << "num:" << num << " *ptr:" << *ptr << endl;
  cin.get(); // wait for a cursor return before ending
  return 0;
}

You should get:

Use the debugger to view that same memory location. Put a breakpoint on the line after “ptr = &num;” by clicking on it and pressing F9. Run the program again and it’ll stop once it gets to that line. Bring up a memory viewing window with Debug→Windows→Memory→Memory1. Type in the address of ‘num’ that’s already printed out, e.g. 0x12ff28. Now step through the program line by line by tapping F10, noting the changes to that memory location with each step.

Exercise 4: Arrays

Now try this program:

#include <iostream>
using namespace std;

int main()
{
    int nums[6] = {5, 6, 0, 8, 9, 0};
    cout << "nums: " << nums << endl;
    cout << nums[0] << endl;   // 5
    cout << *nums << endl;     // 5
    cout << *(nums+1) << endl; // 6

    int* nums2 = nums + 3;
    cout << nums2[0] << endl;  // 8
    cout << nums2[1] << endl;  // 9

    cin.get(); // wait for a cursor return before ending
    return 0;
}

Note:

  • The name of an array is already a pointer to the memory location where the first element of the array is stored.
  • Take a look at the memory location that prints out. Notice the first 4 bytes are “05 00 00 00”. Intel processors store data in Low-High (or “little endian”) byte order. The hex number 0x12345678 is going to be stored in memory as 0x78563412.
  • For any pointer, “ptr[n]” is equivalent to saying “*(ptr+n)”.
  • Math on pointer variables uses “pointer arithmetic”. Consider “*(nums+1)”. That gives us the second number (6), but that number actually lives in memory 4 bytes beyond the first number (5). So adding “1” to a integer pointer actually adds the number of bytes in an integer to the pointer address.

Exercise 5: Char Arrays

Try this program:

#include <iostream>
using namespace std;

int main()
{
    char st[6] = {65, 66, 0, 68, 69, 0};
    cout << "st:" << st << endl;
    cout << "st[2]:" << (int) st[2] << endl;  // 0
    cout << "st[3]:" << (int) st[3] << " / " << st[3] << endl;  // 68 / D

    char* st2 = st + 3;
    cout << "st2:" << st2 << endl;

    cin.get(); // wait for a cursor return before ending
    return 0;
}

Note:

  • There’s special behavior when you print out a character pointer. The print routine assumes that it’s the first character in a set of characters and prints out each one until it reaches a zero character, also called a null character or null terminator.
  • Try code that puts a “C” in the correct position and prints out “st” again – it should print out the full “ABCDE”.
  • When you print out a literal string like “hello”, the compiler puts the characters somewhere in memory, appends a null character, and replaces the literal string with a character pointer value. So the following code fragments are equivalent:
    cout << "hello";
    
    char st[6] = { 'h', 'e', 'l', 'l', 'o', 0 };
    char* ptr = st;
    cout << st;
  • There’s nothing to stop you from setting a second pointer to point to some offset from the start of an original pointer!

Exercise 6: Processing Packed Strings

Here’s one to do on your own.

  • Set a “char* data” variable to point to this string: “abc/0de/0fgh/0ijklm” (note: type backslash-zero instead of the forward-slash zero shown; the blog editor keeps removing my backslash-zeros). Each “/0” gives you a single null character. If you print it out you’ll just see: “abc”.
  • Declare an array of 4 char* values (“char* strings[4]”) and set each one to point to the start of each different string within the original string. First do this with literal offsets (data, data+4, etc.) first, then do it programatically.
  • Print out the 4 different strings (you should see: abc, de, fgh, ijklm).

Exercise 7: Memory Corruption

Let’s take a look at the one of the most infamous kinds of errors in C/C++ programs: memory corruption. This is caused when your program uses pointers to accidentally change memory used for a different variable than you were intending to affect. All your high-level notions of logical debugging go right out the window! Run this:

#include <iostream>
using namespace std;

struct Danger
{
    int data[2];
    int num1;
    int num2;
};

int main()
{
    Danger d;
    d.num1 = 1;
    d.num2 = 2;

    d.data[0] = 5;
    d.data[1] = 6;
    d.data[2] = 7;

    cout << "data[0]: " << d.data[0] << endl;
    cout << "data[1]: " << d.data[1] << endl;
    cout << "data[2]: " << d.data[2] << endl;
    cout << "num1:    " << d.num1 << endl;  // this is 7!
    cout << "num2:    " << d.num2 << endl;

    cin.get(); // wait for a cursor return before ending
    return 0;
}

The cause of the problem isn’t too hard to spot in this case, but let’s go through a couple of standard debugging techniques anyways.

First, position a breakpoint on the line that sets num1 to a known value: “d.data[0] = 5;”. Run the program. When it hits the breakpoint, type “&d” in the Memory 1 address box – Visual Studio is nice and smart like that. Now step through the remaining commands, observing how each command affects memory.

Here’s another approach – try it too. Put the breakpoint in the same location and find out the address of ‘d’ (say 0x12ff1c). Then step the debugger one command to see where ‘d.num1’ is being stored (say 0x12ff24). Set a new Memory Breakpoint (Debug→New Breakpoint→New Data Breakpoint…) at 0x12ff24 with size=4 bytes. Then continue the program by pressing F5. The debugger will stop again on the command that changes the memory used by “d.num1”!

Exercise 8: Casting Pointer Types

Here’s an idea which can be hard to wrap your head around in the beginning: the bytes of data in an executable program are fundamentally typeless. Any given 4 bytes could be 4 characters, 1 integer, 1 float, 1 bool, a pointer to integer, a pointer to char, or anything really.

All the types that we deal with while writing a program are remembered by the compiler just long enough to spit out the correct assembly language commands for the current type. For example “n+=1” means “add 1 to n” if n is an integer but it means “add 4 to n” if n is a pointer-to-integer on a 32-bit system!

You can cast any pointer type to any other pointer type. This doesn’t actually change the data at all – it just changes the type of data the compiler thinks it’s dealing with. This is really powerful because it means the same block of bytes can store different kinds of data at different points in your program. Here’s a code fragment that uses the same char array to store either 4 32-bit floats or 4 32-bit integers:

char chardata[16];

int* intdata = (int*) chardata;
intdata[0] = i1; intdata[3] = i4;

float* floatdata = (int*) chardata;
floatdata[0] = f1; floatdata[3] = f4;

So here’s my programming challenge: given an existing float variable “f”, write a single line of code that assigns an integer “intval” to be the integer value of 32 bits comprising the float. For example, if “f” is “3.4” as a float, those same bits will make the value “1079613850” when interpreted as an integer.

Odds & ends

  • There’s a special kind of pointer type called “void*” (pronounced void star; also just called void pointer). It’s sort of like an Object reference in Java: a generic pointer to memory. Using it is kind of like saying one of two things, either “I’m pointing to something here; I have no freaking clue what it is, nor do I care” or “here’s your pointer to newly allocated memory; cast it to whatever type you need”. For example, a vector in early Java stored things as generic Object references, and likewise if you programmed a vector in C you might just store things as generic void pointers.
  • When declaring a pointer, technically the asterisk belongs with the variable name rather than the base type. For example, declaring “int* a, b;” is the same as “int *a, b;” and in both cases “a” as a pointer-to-int and “b” is just a regular int. This is a bit irregular because to cast a pointer to another pointer type you do associate the asterisk with the target type. In any case “int” is a fundamentally different type than “int*” as so many prefer to associate the asterisk with the type for that reason.
  • Using pointers allows a function to have multiple return values. For example, here’s a call to a function that accepts an angle (2.0 radians) and returns the cosine and sine values of that angle:
    double sinval, cosval;
    get_components( 2.0, &cosval, &sinval );  //results stored in cosval & sinval

    and here’s the definition:

    void get_components( double rads, double *c, double *s )
    {
      *c = cos(rads);
      *s = sin(rads);
    }
  • Every so often you’ll want to send a pointer to a function and have the function set up the pointer to somewhere else. It’s just an extension of the previous idea – you’ll need to pass in the address of the pointer and accept a pointer to a pointer as a parameter, e.g.:
    char* dataptr;
    if (get_data(&dataptr)) ...
    ...
    bool get_data( char** ptrptr )
    {
      *ptrptr = new char[256];
      return true;
    }
  • Many APIs will typedef an arbitrary word to mean a pointer-to-something-else. The above example would often appear as follows:
    typedef char* CHARPTR;
    ...
    CHARPTR dataptr;
    if (get_data(&dataptr)) ...
    ...
    bool get_data( CHARPTR* ptrptr )
    {
      *ptrptr = new char[256];
      return true;
    }

Final Thoughts

C/C++ is a great language for low-level programming, and part of the reason is that pointers let you implement all kinds of complex and compact data structures without ever having to actually shift blocks of memory around – normally an expensive operation. Proper understanding and use of pointers is an integral part of any low-level coding!

Advertisements

2 Comments

Leave a Comment
  1. Geof / Jun 1 2008 3:19 am

    After all these years of trying to figure out pointers this has actually helped my understanding.

    I feel pretty confident about the *, but the address (&) usage is still confusing to me. It makes sense when I see it used but I’m not sure I can connect the dots on my own.

    Still, all this is an interesting read. Thanks for posting it!

  2. naveen arora / Nov 21 2008 2:21 am

    hmmm… actually there is no low level coding in your post. i thought there some sort of program through which i can play with mouse, keyboard and vdu all,,,,
    any how….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: