Vladimir Prus


vladimirprus.com

Thursday, October 27, 2005

Where am I?

How can an application or a library find out its own location on Linux?

Suppose you want an application for Linux to work after being just unpacked to some directory. No setting of COOL_APP_HOME environment variable, no creation of ~/.coolapprc, nothing. If the application is one self-contained binary, no problem, but if there are some resources (icons, or translations, or random data files), you need to find them somehow. And since the only action user did was unpacking the application, you can only find resources by using a path relative to the application's path. But finding application path is not straightforward.

The obvious approach is using argv[0]. But not quite simple, because argv[0] can be:

  1. An absolute path. You just strip file name from it and get the path to the application.
  2. A relative path. You need to join current directory with that relative path, and strip file name.
  3. Application name without the path. Could happen if application was found via the PATH environment variable. It's necessary to iterate over all PATH elements, trying to find the application there.
  4. Anything. The value of argv[0] is specified in exec* call and can be absolutely anything.

A more reliable method is using the /proc filesystem. The /proc/<pid>/exe is a symbolic link the the application, and /proc/self is the same as /proc/<pid of current application>. So, calling readlink on /proc/self/exe gives the desired effect.

But it does not work for shared libraries. Shared library can have its own resources, and might want to find them relatively to library path. Using /proc/self/exe will give the path to the application, not to the library. The solution here is the dladdr function, which is Linux extension to the dynamic loader interface. Here's example use:

std::string where_am_i()
{
    DL_info info;
    if (dladdr( &where_am_i, &info ) == 0)
    {
        return info.dli_fname; 
    }
    ....
}

The function takes a code address, and returns information about shared library this address belongs to. So, if where_am_i function is defined in a shared library, the above code will return the path to that library. Unfortunately, this works only for dynamic libraries, but not for application. So, for a really reliable solution one has to combine the /proc/self/exe trick with the dladdr trick.

The only problem is that both tricks are specific to Linux. Why such a basic functionality not in POSIX?

The last interesting case is when the "resource" your applications uses is a shared library with on-startup linking (not explicit dlopen linking). The path to the library should be added to dynamic linker search path before the application is started, so the above tricks won't help. Forgunately, it's not necessary to create helper applications or scripts, you just need an extra options when linking:

g++ -o executable -Wl,-R -Wl,'$ORIGIN' executable.o libhelper.so

The -Wl,-R -Wl,'$ORIGIN' options adds a new element "$ORIGIN" to dynamic library search path in the executable, and the dynamic linker will replace $ORIGIN with the path of the executable.

With all those tricks in hand, it's not longer needed to know beforehand the directory where the application will be installed. But I'd still prefer nice builtin support, like Mac OSX bundles.

3 comments:

Anonymous said...

autopackage (www.autopackage.org) has been tackling the binary relocatability problem you describe, and they've come up with a few solutions.

For apps, /proc/self/exe is the best approach. realpath(argv[0], ...) works but like you said, exec() can break it.

For libraries, take the memory address of anything in the library's static memory (eg. an empty string, "") and search for that address in /proc/self/maps, the line contaning the address with give the library's full path.

Autopackage came up with another way, ./configure --prefix=/proc/self/fd/200, where fd 200 is created by a static constructor and points to the directory (the tool you want to look at in autopackage is binreloc).

OBLISK does ./configure --prefix=OBLISKOBLISK... and at install time does a search-and-replace of the binaries for that string and replaces it with the install path.

Windows has the easier way, GetModuleFileName()...

acyclic said...

dladdr stops working for shared libraries when you use -Wl,-rdynamic

acyclic said...

^^^ ignore, user error