Vladimir Prus


Tuesday, April 04, 2006

Unlucky numbers: 48, 58 and 388

One day I've got a remarkable bug report:
We're calling the function from your library, and after 48 successfull calls, it crashes. Can you look into this?
Initially, I was curious how did they count that '48'. It turned out that the application was repatedly doing the same action, calling my code alogn the way, and counting the number of repeatitions, so '48' was the exact number, and the thing conistently crashed after 48 calls.

The bug report did not include any calling code, so I've asked for the code to be sent, and went home, and while in a bus decided that either it's fixed size buffer in the calling code, or resource leak, like file descriptors leak.

And sure thing, next morning I looked at the only thing where I used plain FILE* in order to use Bison-based parser, and there was missing fclose call. Feeling rather smart, I've sent the fixed version back.

After 30 minutes new bug report arrived saying that the code fails after 58 calls. And this time, the bug does not reproduces for me. After several tries I found out that for me, the unlucky number is actually 388, so I need to wait a bit to reproduce the bug.

This was resource leak too, though a subtle one. The library was calling external tool, and modified the PATH environment to make sure the tool is found. As the result, the length of PATH variable steadily increased, and finally some OS limit would be reached. After that the value of PATH becomes completely bogus and the external tool won't be found.

I'm really glad we have Valgrind so at least memory leaks don't require any magic to debug.

No comments: