Vladimir Prus


vladimirprus.com

Saturday, December 17, 2005

Not my bug

Often most debugging time is spent of pretty uninteresting bugs.

For example, most part of last Friday I was staring into debugger backed up by MIPS simulator, trying to figure out why exactly one test of 10 run decided to crash on a MIPS board. Let me first show you the code that was found guilty in the end:


static void itoa_really(unsigned long long value, unsigned int base)
{
    static char digits[] = {'0', '1', '2', '3', '4', '5',
                '6', '7', '8', '9', 'a', 'b', 'c', 
                'd', 'e', 'f'};

    if (!value)
        return;
    else {
        long rem;
        itoa_really(value / base, base);
        rem = value % base;
        target_putchar(digits[rem]);
    }
}
This is extremely basic code, but yes, it did broke everything.

Initially, the crash happened on a real MIPS board, on a pretty complex test. It was reduced to a test printing just two 64-bit values, and crash depended on the printed values. Say, printing 0 worked, but printing 0x11223344556677 crashed. It was not possible to debug the problem on the board, and the few floating licences for simulator were in use, so I tried to solve the problem by poking around. At one time I've added an extra line of code before call to "target_putchar", and the line was:


rem = 3;
Strangely, the crash was gone.

At this point I though that maybe there's some really wrong with 64 on 32 division, and 'rem' can become wrong, so I replace that "rem = 3" line with:


if (rem < 0) { rem = 0; }
if (rem > 15) { rem = 15; }
The crash appeared again.

Another try was:


if (rem < 3) { rem = 3; }
if (rem > 3) { rem = 3; }
and the crash was still there.

Finally, with:


if (rem <= 3) { rem = 3; }
if (rem > 3) { rem = 3; }
everything worked. Of course, all values was printed as "3333.....333", but it did not crash. Completely buffled, I decided simulator is the only help and eventually we've grabbed the license.

A couple of hours later, I found the one-character fix (literally). The linker config file specified mere 1KB of stack size, and when printing too large values, itoa_really recursed too deep, overflowing the stack, and overwriting a bunch of program variables. Changing to 4KB of stack set it all right. And what's even more funny, I never ever touched that linker config file, and have no idea what kind of programs are supposed to work on such stack space diet.

5 comments:

Anonymous said...

..but why are you using recursion for this algorithm anyway? it's a loop.

Vladimir Prus said...

Recursion is just more straight-forward to implement. This is not time-critical part (I'd expect talking over serial interface to be considerably more slow then actually formating number), or not memory-intensive.

Anonymous said...

"or not memory-intensive."
but you've made it memory intensive by using recursion where it's not needed.
using a loop you merely need a temporary buffer the size of the biggest string (64 bytes in this case).

#define MAX_STR 64
static void itoa_really(unsigned long long value, unsigned int base)
{
static char digits[] = {'0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'b', 'c',
'd', 'e', 'f'};

int i=MAX_STR;
char tmp[MAX_STR+1];

for (tmp[i]=0; value; value=value/base)
tmp[--i] = digits[value%base];
while (i<MAX_STR)
target_putchar(tmp[i++]);
}

Vladimir Prus said...

Sure, loop is more memory intensive then recursion, but I basically took existing code and had no idea this small difference in memory usage matters, so did not bother to rewrite this as loop.

Vladimir Prus said...

Actually, I mean "loop is less memory intensive than recursion".