A Challenge Discussion – Returning Pointers To Local Variables

Last week’s challenge was a tough one, so don’t be alarmed if you had no idea what the issue might have been.

Also, it can be hard to spot problems in a functional program because you tend to concentrate on what the code is actually doing, rather than the validity of the code itself.

To give you a clue, before I talk about last week’s code, here’s a very small version with a similar problem:

#include <iostream>
#include <string.h>

using namespace std;

int* func()
{
    int num = 256;
    int* tmp;
    tmp = &num;
    return tmp;
}

int main()
{
    int* x = func();
    cout << "Value of x is: " << *x << endl;
    return 0;
}

You might spot this one immediately, but you can be forgiven for missing even this.

The output of this program is as you would expect: it prints 256 to the screen.

However, what if I modify the code to add a couple of extra (unrelated) lines:

#include <iostream>
#include <string.h>

using namespace std;

int* func()
{
int num = 256;
int* tmp;
tmp = &num;
return tmp;
}


int main()
{
int* x = func();

char tmp[] = "hello world"; //extra line of code
int len = strlen(tmp);      //extra line of code

cout << "Value of x is: " << *x << endl;
return 0;
}

Now, the output of the program (on my machine) is zero.

What!?

Yes, that’s right. Apparently, creating an array of chars and finding the length of it has messed up my original code. Surely not?

In fact, that is exactly what has happened.

If you look at func, you can see that I’m creating an int and setting it to 256. All good.

Then I create a pointer to an int and point it to 256. Still okay.

Then I return the pointer from func so someone else can see what it points to.

Uh oh!!

As soon as I exit func, the variables all go out of scope. That means that they no longer exist and whether or not they return the right value is down to pure chance.

In the first program, I get lucky, and it prints correctly, because 256 just happens to be sitting in memory undisturbed. It’s still there, but only because nothing has used that memory – which is now considered free and available to use. In the second program however, the additional lines that call strlen overwrite the stack and my original 256 value is lost forever. Bah!

As you can see, even with -Wall, there is NO compiler warning about this, and if you get lucky (as in program 1), you might never even know a bug was there. In fact, if the value you are expecting falls within a wide range, rather than being set at say 256, there is every chance that you could run this code for years and be none the wiser that the values you are getting back are in fact totally unrelated to what you are expecting (I have seen this exact bug in real life).

So, the moral of this is: never return a pointer to a local variable.

Now let’s go back to last week’s code and see what’s happening:

#include <iostream>
#include <string.h>

using namespace std;

char* isPalindrome(char* word)
{
    char* ret = 0;
    ret = (char*)"Yes! This is a palindrome.";
	
    char *p = word;
    int len = strlen(word);
    char *q = &word[len-1];

    for (int i = 0 ; i < len ; ++i, ++p, --q)
    {
        if (*p != *q)
        {
            ret = (char*)"This is not a palindrome.";
        }
    }
    return ret; 
} 

int main()
{ 	
    while (1)
    {
        char buffer[16] = {0};
        cin >> buffer;

        char* result = isPalindrome(buffer);

        cout << result << endl;
    }
    return 0;
}

In this code, we are doing a very similar thing. In the isPalindrome function, we’re declaring a pointer to a char and returning it to the main function. This is bad, right? Because once the isPalindrome function returns, all the variables will go out of scope.

Well, it turns out to be even more complex than that (I told you this was a tough one, but bear with me – it’s totally worth reading to the end!).

You would expect that even though this program runs correctly at the moment, if the program was larger and had more code, there would be every chance that something would overwrite the stack and you wouldn’t get the right result back. But the thing is, if you try to break this code you can’t. It doesn’t matter what you do, how many more variables or functions you add, how much stack space you use up, it always runs correctly.

How can this be possible?

Here’s a clue.

If you change the main function to add this line you get a segfault at run time:

int main()
{ 	
    while (1)
    {
        char buffer[16] = {0};
        cin >> buffer;

        char* result = isPalindrome(buffer);
        result[0] = 'x'; //ADD THIS LINE HERE

        cout << result << endl;
    }
    return 0;
}

Why on earth?

The program segfaults because by trying to alter the string that is returned you are actually accessing read-only data.

How come?

When the program is compiled, string literals (like “This is not a palindrome.”), are treated differently to everything else and are stored in a special area of read-only memory which is neither the heap nor the stack. In fact, if you examine the memory addresses of the variables in gdb, you would see that the address that char* ret points to is very different to the all the other addresses that are allocated to the program variables.

So, in this example, we return a pointer to an address in read-only memory and don’t have to worry about anything overwriting it, because it is safely tucked away from the rest of the code!

However, when someone tries to write to what they might assume is a character array, using the line added above, they would get a segfault.

Nasty, eh?

So, the big question, is this correct correct and valid, or not?

I’d say not really. Firstly, you shouldn’t be returning a pointer to a local variable – in this case you’d get lucky and it would always work. However, better practice would be to declare the string as const, so a programmer couldn’t try to modify it without realising the consequences. It should also be made 100% clear with a comment that this was intended and you aren’t just returning a pointer to a local variable without understanding what the consequences are generally. Finally, you should be aware that this isn’t always the case – some compilers/processors may implement string literals differently. Overall verdict? Don’t do it.

Just before I finish up, if you’re wondering how to ensure that your string literal is actually a char array that is writeable and not a string literal in read-only memory, the difference is as follows:

char *x = "This is a string literal";
char y[] = "This is a character array";

char y above is another way of writing the following:

char y[] = {'T','h','i','s',' ','i','s',' ','a',' ','c','h','a','r','a','c','t','e','r',' 'a','r','r','a','y','\0'};

Scoring

One point for recognising that I shouldn’t have returned a pointer to a local variable.

A big bonus point for recognising that I was using a string literal that couldn’t be modified even if it was successfully accessible after the function exits.

Did you learn something new?

I told you it was worth reading to the end 😉

Have a good week!



A Challenge

I have a challenge for you. Is there a problem with this code?

It compiles without warnings, and runs as expected.

I’ll reveal all next week.

Have fun!

g++ -g -Wall palindrome_problem.cpp
#include <iostream>
#include <string.h>

using namespace std;

char* isPalindrome(char* word)
{
    char* ret = 0;
    ret = (char*)"Yes! This is a palindrome.";
	
    char *p = word;
    int len = strlen(word);
    char *q = &word[len-1];

    for (int i = 0 ; i < len ; ++i, ++p, --q)
    {
        if (*p != *q)
        {
            ret = (char*)"This is not a palindrome.";
        }
    }
    return ret; 
} 

int main()
{ 	
    while (1)
    {
        char buffer[16] = {0};
        cin >> buffer;

        char* result = isPalindrome(buffer);

        cout << result << endl;
    }
    return 0;
}


Virtual Destructors

Interview questions often ask about virtual destructors and why they are needed. It’s one of those funny things about C++ that unless you have specifically been shown, you just might not realise you need to know.

So why do you need a virtual destructor?

Essentially, you need a virtual destructor to make sure that ALL the relevant destructors are called when you delete an object via a base class pointer.

Huh?

Well, that actually sounds far more complicated than it is.

Let’s use an example to make it easier. Imagine you have a base class called Container and a sub class called CardboardBox. You create a CardboardBox, and then use the base class pointer to access it (why? see below). When you’re all done, it’s time to delete your object.

Now, the base class doesn’t do much in the way of clearing up, but the subclass (the cardboard box), can be recycled.

So, when your base class destructor is virtual, deleting the base class pointer will call the sub class destructor without you having to do anything extra – and the box will be recycled as if by magic.

If, however, your base class destructor is not virtual, only the base class destructor will get called and the recycling will not happen. Very bad for the environment.

Ouch.

Let’s see this in action with a code example:

#include <iostream> 
using namespace std;

class Container
{
public:
	Container() {};
	virtual ~Container()
	{
		cout << "Base class destructor called - not much to do." << endl;
	};
};

class CardboardBox: public Container {
public:
	CardboardBox() {};
	~CardboardBox()
	{
		cout << "Sub class destructor called - let's recycle the cardboard!" << endl;
	};
};

int main()
{
	CardboardBox* box = new CardboardBox;

	Container* p = box;

	delete p;

	return 0;
}

When the destructor is virtual (as it should be), the output of this program is as follows:

virtual1

You can see that both the sub class and the base class destructors have been called, and you have been a green and conscientious consumer.

However, if you remove the ‘virtual’ keyword from the base class destructor, the output changes to this:

virtual2

Oh dear. No one recycled their cardboard today.

So you see, making your base class destructors virtual will help keep everything nice and tidy. Which is of course, just how we like it :-)

Why are you using a base class pointer to access the cardboard box?

This is the key to polymorphism in C++. You can use a base class to interface with objects without knowing what they are. So in this example you can talk to objects via the base class (Container), and you don’t need to specifically know whether they are cardboard boxes, metal boxes, or wicker baskets!

I’ve a fuller explanation of polymorphism here.



Linux Commands: nl

There is an incredible selection of little Linux commands that will do all sorts of things to your files, standard output and programs. Now and again I’m going to pick one out and have a play because you just never know when a little tool already exists that will do exactly what you need.

Today I’m going to look at nl, which prepends line numbers to any file that you pass it to.

Remember this one as “number lines” (think Numberwang on Peep Show). Don’t get confused and use ln, because that is for linking files :-)

Okay, so I’ve got a little C program, and you can view the source code on the command line with cat:

cat main.c

cat

Very nice. Now let’s output the file with nl instead:

nl main.c

nl1

Look at that! Numbers down the side. Isn’t that handy?

But that’s not all.

Say you wanted to number all the lines – not just the ones with source code. This time use the flag -ba:

nl -ba main.c

nl2

Nice.

Now let’s go a little retro, start the numbering at 10 and increase it by 10 each time (BBC Basic anyone?).

Use -v to start at your required number and -i to specify the increment:

nl -ba -v10 -i10 main.c

nl3

You can even add a string in front of each line.

Here I’ve added a colon after each number:

nl -ba -v10 -i10 -s": " main.c

nl4

The only thing missing really is the option to add disco colours and flashing text 😉

Check out the man page for full details on all the options (man nl).