Vladimir Prus


vladimirprus.com

Saturday, December 17, 2005

Not my bug

Often most debugging time is spent of pretty uninteresting bugs.

For example, most part of last Friday I was staring into debugger backed up by MIPS simulator, trying to figure out why exactly one test of 10 run decided to crash on a MIPS board. Let me first show you the code that was found guilty in the end:


static void itoa_really(unsigned long long value, unsigned int base)
{
    static char digits[] = {'0', '1', '2', '3', '4', '5',
                '6', '7', '8', '9', 'a', 'b', 'c', 
                'd', 'e', 'f'};

    if (!value)
        return;
    else {
        long rem;
        itoa_really(value / base, base);
        rem = value % base;
        target_putchar(digits[rem]);
    }
}
This is extremely basic code, but yes, it did broke everything.

Initially, the crash happened on a real MIPS board, on a pretty complex test. It was reduced to a test printing just two 64-bit values, and crash depended on the printed values. Say, printing 0 worked, but printing 0x11223344556677 crashed. It was not possible to debug the problem on the board, and the few floating licences for simulator were in use, so I tried to solve the problem by poking around. At one time I've added an extra line of code before call to "target_putchar", and the line was:


rem = 3;
Strangely, the crash was gone.

At this point I though that maybe there's some really wrong with 64 on 32 division, and 'rem' can become wrong, so I replace that "rem = 3" line with:


if (rem < 0) { rem = 0; }
if (rem > 15) { rem = 15; }
The crash appeared again.

Another try was:


if (rem < 3) { rem = 3; }
if (rem > 3) { rem = 3; }
and the crash was still there.

Finally, with:


if (rem <= 3) { rem = 3; }
if (rem > 3) { rem = 3; }
everything worked. Of course, all values was printed as "3333.....333", but it did not crash. Completely buffled, I decided simulator is the only help and eventually we've grabbed the license.

A couple of hours later, I found the one-character fix (literally). The linker config file specified mere 1KB of stack size, and when printing too large values, itoa_really recursed too deep, overflowing the stack, and overwriting a bunch of program variables. Changing to 4KB of stack set it all right. And what's even more funny, I never ever touched that linker config file, and have no idea what kind of programs are supposed to work on such stack space diet.

6 comments:

Anonymous said...

..but why are you using recursion for this algorithm anyway? it's a loop.

Vladimir Prus said...

Recursion is just more straight-forward to implement. This is not time-critical part (I'd expect talking over serial interface to be considerably more slow then actually formating number), or not memory-intensive.

Anonymous said...

"or not memory-intensive."
but you've made it memory intensive by using recursion where it's not needed.
using a loop you merely need a temporary buffer the size of the biggest string (64 bytes in this case).

#define MAX_STR 64
static void itoa_really(unsigned long long value, unsigned int base)
{
static char digits[] = {'0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'b', 'c',
'd', 'e', 'f'};

int i=MAX_STR;
char tmp[MAX_STR+1];

for (tmp[i]=0; value; value=value/base)
tmp[--i] = digits[value%base];
while (i<MAX_STR)
target_putchar(tmp[i++]);
}

Vladimir Prus said...

Sure, loop is more memory intensive then recursion, but I basically took existing code and had no idea this small difference in memory usage matters, so did not bother to rewrite this as loop.

Vladimir Prus said...

Actually, I mean "loop is less memory intensive than recursion".

clash royale private said...

Legendary Royale is the best Clash Royale Private Server. New created and edited cards by the user ideas! Get the APK now and play your own Clash Royale!