A couple of days ago I've learned that the Microsoft linker can merge functions with binary identical bodies. Looks rather good idea, especially with templates, and it seems that gcc does not implement this. Why don't estimate possible size saving?
Basically, we just need to get all function bodies from an object file or shared library and check for duplicates. The only thing to watch out are relocations. Consider this code:
extern int a, b;
int foo() { return a; }
int bar() { return b; }
The binary code for both functions will be identical:
But later, either during static link or during application startup, those eight zeros will be replaced with addresses of variables_Z3foov: 55 pushl %ebp 89E5 movl %esp, %ebp A1000000 movl a, %eax 00 5D popl %ebp C3 ret
a
and
b
, which are obviously different.Luckily, this is easy to fix. When compiling with -fPIC, the compiler will produce code that's not modified during startup. This means that two identical function bodies will remain identical. Since all shared libraries are built with -fPIC, we can safely use shared libraries for testing.
However, running the test program on a couple of template-extensive Boost libraries produce rather discouraging results.
461 functions found 2073 unique function bodies 205589 bytes of function bodies Function merging would have saved 3898 bytes
3313 functions found 2782 unique function bodies 484635 bytes of function bodies Function merging would have saved 7808 bytes
Most of the functions with duplicate bodies were rather small in size -- 5-10 bytes -- basically empty. Probably, there's some other variable I did not consider. Or maybe gcc's register allocator is non-deterministic. What is the actual problem is a topic for some future post.
As an aside, I've used the BFD library to read the files in ELF format, and got rather mixed feelings. On one hand, I've did what I wanted. On the other hand, the library is horrible. The documentation structure is the worst I've ever seen, and it's very hard to find anything. There's no complete compilable example to get started. The interface is terrible C -- with the need to manually allocate memory all over. And finally, I was hit by two weird things.
First, is that the following reasonably-looking code does not work:
bfd* object = bfd_openr("object.o", "i686-pc-linux-gnu");
assert(object);
asection* s = bfd_get_section_by_name(object, ".text");
The second call just segfaults. Changing the code to
bfd* object = bfd_openr("object.o", "i686-pc-linux-gnu");
assert(object);
assert(bfd_check_format (object, bfd_object));
asection* s = bfd_get_section_by_name(object, ".text");
makes it work fine. That's just voodoo.Another problem, is that is not possible to get the size of a symbol. Really, this basic information can't be obtained via BFD's public interface.
4 comments:
This linker optimisation flag is generally a pretty bad idea - C/C++ guarantees that the addresses of two different functions are not equal, but turning this switch on breaks that.
We also found that on large links, the linker would spend up to 20 minutes considering all the functions against each other - usually for very little gain. This is a specialist flag that's only usefuil on small projects with heavily templated code where the templating generates lots of identical functions for each template instantiation, and even then should be used with care.
Those are quality of implementation issues. It's quite possible to disable merge body optimization for functions that have their address ever taken. And it's possible to detect duplicate function bodies with N*log(N) algorithm, or even in "practically linear" time if you use hash table.
The problem is real, as with template-intensive code gcc generates huge binaries
"The problem is real, as with template-intensive code gcc generates huge binaries"
clearly, especially on embedded platforms or even current consoles like the PS3, where having small binaries is crucial.
and why not an intermediate setting:
merging functions that have identical signatures, instead of identical binary bodies.
it would be both faster to merge, and wouldn't fuck-up debugging.
and would still merge all the templated functions that are automatically inserted inside each compilation-unit they're used in.
Thaank you
Post a Comment