GDB Tutorial Command Line Walkthrough : Part 2

Track down those segfaults

Now we’ve got an idea of how GDB works, lets look at a more complex example.

The program in Listing 2 (see bottom of page) should print out a familiar character in your terminal window. It uses a C array called template to determine where to print out a character versus a space. The template contains numbers that are used in pairs – the first is the position in the buffer to start writing, the second is the number of characters to write. There are 18 pairs in total.

Compile the program as follows:

gcc -fno-stack-protector -g invader.c -o invader

(Note: The no-stack-protector option just allows us to easily see a buffer overrun at work without the compiler trying to save us from it. Protection is enabled by default on some platforms, such as Ubuntu, but not on others, like Fedora. Adding this flag will disable it.)

When you run the invader program, you should immediately see a segmentation fault. Oh no! Don’t despair, let’s run it again in GDB and see if it can help. The segfault occurs and GDB stops, reporting that the program received SIGSEGV, along with a function and line number (screenshot below).

segfault in gdb

The line of code that the problem occurred in is handily displayed for you. If you want to see more source code, you can list it with l, which displays 10 lines around the current line. If you type l again, you’ll get the next 10 lines. To go backwards, type l -.

The line that GDB has stopped at gives us a big clue. It is a comparison of values in an array, so the segfault is possibly caused by the use of an index that is past the end of the declared array. Type:

(gdb) i lo

This displays information on all the local variables in the current stack frame (i.e. the current function call). You can immediately see that the count variable is a very large number indeed, and our template array certainly isn’t that long. Let’s put a breakpoint in at line 29 and see what’s happening to count:

(gdb) b 29

You don’t have to quit GDB to start again – just type r and GDB will ask you if you want to run the program from the beginning. Type y and on the next run, when the program stops at the for loop, you can add a watchpoint:

(gdb) watch count

This is a lovely little command that asks GDB to keep an eye on count and report its value each time it is changed*. Now you’re all ready to see what’s going on, so continue with c. The next time the program breaks, we can see that two values of count are reported, old and new. So far this looks exactly as expected – the old value is 0 and the new value is 2 which is correct as the for loop increments 2 at a time. Press Return to repeat the last command, and GDB will break again the next time it changes. This time things look wrong. The old value is 2, but the new value is 64 (Figure 3).

gdb watchpoint

Why is the value of count suddenly jumping to 64?

Lets look at GDB’s output a little closer. On a watchpoint, GDB displays the line it is currently halted at. That line has NOT yet been executed. It’s important to remember this. So, count has been changed, and GDB has stopped at line 35. List the source to see the last instruction with l.

You’d be forgiven for looking at the source and thinking, well, the last instruction before line 35 was:

int i = 0;

at line 34; surely that’s not going to be doing anything bad?

But remember – GDB has stopped at the top of a for loop, and it may already have run several times. List the local variables with i lo and you can see that the value of i is not zero – therefore the loop has been executed several times and the last line to be run is actually the one below, where the character ‘@’ is written to the buffer.

You can also see that pos indexes to a position outside the declared buffer length of 45 (the exact value of pos at this point may vary). The decimal ascii value of the ‘@’ character is 64, which is exactly what count is, so there you have it: the buffer is being overrun and as a consequence ‘@’ signs are being stamped all over the next variable in memory, which happens to be count.

Now you’ve found the problem, but you still need to know why it’s happening so you can fix it.

The pos variable should never be greater than the buffer length (if the template is correct), and the loop will increment pos while i is less than len. The value of len is set directly from the template and in this case is 33. However, if you look at the template you can see that there isn’t an instance of the value 33 anywhere.

To find out why len is such an odd number, you need to adjust the current session to focus on the len variable. Type d to delete all breakpoints, then set a new one at the point where len is set with b 32.

For reference, breakpoints (and watchpoints) can also be deleted individually: they are allocated numbers when you create them, so you can list and delete them one by one using i b followed byd <breakpoint number>.

Restart the program again with r. When GDB hits the new breakpoint, remember that the line you are looking at hasn’t yet been executed. Run this line, by typing n and then examine len with the print command:

(gdb) p len

And take a look at count too, to see where in the template we are obtaining len from:

(gdb) p count

The variable count is zero, so we’re reading the first item from the template instead of the second and then adding 1, making 9. This is our bug, as the template holds pair values and the second item should be used as the length. The programmer has made a typo (or misunderstood how arrays are referenced), by adding ‘+1’ outside the subscript operator at line 32. You can fix this by changing line 32 to:

(gdb) int len = template[count+1];

Now you need to exit GDB with q, recompile the source and re-run. You have to exit GDB completely if you change source files – if you don’t, GDB won’t pick up your changes and you’ll be debugging the same old version of the program as before. This is something that even veterans of GDB sometimes forget!

So there we have it, our first buffer overrun tracked down with minimum pain and fuss. Just a little more to do and we should have a fully functioning program.

On to Part 3.

*The watch command is very useful, but you should be aware that in a multithreaded environment,watch will not notify you of changes that are made to your selected variable while the thread that contains it is not being executed. Therefore if you are facing a memory overwrite issue that is originating from another thread, you won’t be notified at the time this happens, you will only see the change when you switch back to the thread that contains the variable you are watching.

Listing 2

Copy and paste, or download the invader.c file here.

/* 
# invader.c 
# Print a pattern to the terminal using a predefined area 
# of 44 characters width and 16 lines height 
# The template defines the pattern with a series of pairs: 
# the first is the position in the current line that should 
# be filled in, the second is how many characters should be filled. 
*/ 

#include <stdio.h>  
#include <string.h> 

#define BUF_LEN 45  /*44 plus null terminator*/ 
#define TEMPLATE_LEN 37 

int template[TEMPLATE_LEN] = 
{ 8,4,32,4,12,4,28,4,8,28,4,8,16,12,32,8,0,44, 
0,4,8,28,40,4,0,4,8,4,32,4,40,4,12,8,24,8,0 }; 

int main() 
{ 
    unsigned int count = 0; 
    char buffer[BUF_LEN]; 

    memset(buffer, ' ', BUF_LEN); 
    buffer[BUF_LEN-1] = 0x00; 

    printf("\n\n"); 
    for (count = 0 ; count < TEMPLATE_LEN-1; count+=2) 
    { 
        int pos = template[count]; 
        int len = template[count]+1;				 

        int i = 0;
        for (i = 0 ; i < len ; ++i, ++pos)
        {
            buffer[pos] = '@';
        }

        /*check for end of row*/
        if (template[count] >= template[count+2])
        {
            /*print and reset*/
            printf("  %s\n  %s\n", buffer, buffer);
            memset(buffer, ' ', BUF_LEN);
            buffer[BUF_LEN-1] = 0x00;
            count = 0;
        }
    }
    printf("\n\n"); 

    return 0; 
}

One Comment

  1. Shaun McDonough
    Posted 11 April 2016 at 23:35 | Permalink

    I’m not seeing the Core dump when I run the program in Listing 2. I tried it on my Beaglebone with Debian and my Linux Mint 17.3 box.

    I ran the program for several minutes and it just kept on spitting out characters.