Vladimir Prus


vladimirprus.com

Thursday, April 27, 2006

Non-constant size

Quite some time ago, when I was learning STL, all information sources stressed the importance of learning complexity guarantees that methods of various containters make. One specific subtle thing is that the std::list<>::size() method runs in linear time, not in constant time. It was explicitly designed that way for a reason, described in STL faq, but what matters to an ordinary programmer is that testing lists for emptyness should be done with the empty method, not by comparing size to zero, otherwise it's easy to kill performance.

Today, I run into another case where non-const-time size() matters. I was testing KDevelop on some testcase, and noticed that getting list of stack frames from gdb takes a lot of time. I've added some profiling code, and found that a one command takes 200ms to execute. Adding profiling code to gdb revealed that gdb itself takes some 70ms. Of course, that's not ideal, but even larger fraction of time was apparently spend in KDevelop, ehm, parsing the response.

So I've quickly put up a testcase that repeatedly parses a specific response, and ran it under callgrind. Ten minutes later I've got a profile with strlen on top. It turned out that the parsing code was using QCString and calling it's length method at least one for each token, and for certain tokens -- once for each character. The length, in turn, just calls strlen. Since the input string was 20K in size, most of runtime was spend measuring the size of that string.

Another unexpected behaviour was found in the QCString::mid function. Internally, it also calls length, and mid was called once for each token.

After uses of non-const-time methods were reduced to minimum, the parsing time my test case decreased 40x. No so bad, I think. The only problem is that time spend in gdb is still to high for a GUI, and that won't be that easy to fix.

No comments: