Vladimir Prus


vladimirprus.com

Thursday, March 24, 2005

Duplicate function bodies

A couple of days ago I've learned that the Microsoft linker can merge functions with binary identical bodies. Looks rather good idea, especially with templates, and it seems that gcc does not implement this. Why don't estimate possible size saving?

Basically, we just need to get all function bodies from an object file or shared library and check for duplicates. The only thing to watch out are relocations. Consider this code:

extern int a, b;
int foo() { return a; }
int bar() { return b; }
The binary code for both functions will be identical:
   _Z3foov:
55                    pushl   %ebp
89E5                  movl    %esp, %ebp
A1000000              movl    a, %eax
00
5D                    popl    %ebp
C3                    ret
But later, either during static link or during application startup, those eight zeros will be replaced with addresses of variables a and b, which are obviously different.

Luckily, this is easy to fix. When compiling with -fPIC, the compiler will produce code that's not modified during startup. This means that two identical function bodies will remain identical. Since all shared libraries are built with -fPIC, we can safely use shared libraries for testing.

However, running the test program on a couple of template-extensive Boost libraries produce rather discouraging results.

Boost.Python:

461 functions found
2073 unique function bodies
205589 bytes of function bodies
Function merging would have saved 3898 bytes

Boost.Regex:

3313 functions found
2782 unique function bodies
484635 bytes of function bodies
Function merging would have saved 7808 bytes

Most of the functions with duplicate bodies were rather small in size -- 5-10 bytes -- basically empty. Probably, there's some other variable I did not consider. Or maybe gcc's register allocator is non-deterministic. What is the actual problem is a topic for some future post.

As an aside, I've used the BFD library to read the files in ELF format, and got rather mixed feelings. On one hand, I've did what I wanted. On the other hand, the library is horrible. The documentation structure is the worst I've ever seen, and it's very hard to find anything. There's no complete compilable example to get started. The interface is terrible C -- with the need to manually allocate memory all over. And finally, I was hit by two weird things.

First, is that the following reasonably-looking code does not work:

bfd* object = bfd_openr("object.o", "i686-pc-linux-gnu");
assert(object);
asection* s = bfd_get_section_by_name(object, ".text");
The second call just segfaults. Changing the code to
bfd* object = bfd_openr("object.o", "i686-pc-linux-gnu");
assert(object);
assert(bfd_check_format (object, bfd_object));
asection* s = bfd_get_section_by_name(object, ".text");
makes it work fine. That's just voodoo.

Another problem, is that is not possible to get the size of a symbol. Really, this basic information can't be obtained via BFD's public interface.

3 comments:

Anonymous said...

This linker optimisation flag is generally a pretty bad idea - C/C++ guarantees that the addresses of two different functions are not equal, but turning this switch on breaks that.
We also found that on large links, the linker would spend up to 20 minutes considering all the functions against each other - usually for very little gain. This is a specialist flag that's only usefuil on small projects with heavily templated code where the templating generates lots of identical functions for each template instantiation, and even then should be used with care.

Vladimir Prus said...

Those are quality of implementation issues. It's quite possible to disable merge body optimization for functions that have their address ever taken. And it's possible to detect duplicate function bodies with N*log(N) algorithm, or even in "practically linear" time if you use hash table.

The problem is real, as with template-intensive code gcc generates huge binaries

momotte said...

"The problem is real, as with template-intensive code gcc generates huge binaries"

clearly, especially on embedded platforms or even current consoles like the PS3, where having small binaries is crucial.

and why not an intermediate setting:
merging functions that have identical signatures, instead of identical binary bodies.
it would be both faster to merge, and wouldn't fuck-up debugging.
and would still merge all the templated functions that are automatically inserted inside each compilation-unit they're used in.