Help Ld to Sort Static Libraries

| Comments

The GNU linker ld is not very smart when it comes to finding functions in static libraries. It’s a bit weird because that is what linkers are primarily supposed to do.

Let’s say you have a function f in a library libF and a function g in a library libG, and g is calling f. Now if you try to link an executable that uses function g and pass the libraries to ld in the alphabetical order (which is reasonable) like so:

$ gcc main.c libF.a libG.a

you are gonna get an error about missing function f:

libG.a(g.o): In function `g':
g.c:(.text+0xa): undefined reference to `f'
collect2: error: ld returned 1 exit status

Weird, eh? Well, apparently it’s a feature or rather an optimization.

Normally, an archive is searched only once in the order that it is specified on the command line. If af symbol in that archive is needed to resolve an undefined symbol referred to by an object in an archive that appears later on the command line, the linker would not be able to resolve that reference.

To fix the problem you’d have to pass the libraries in a different order, so that libG comes first on the list. For a more complicated case, when there are circular dependencies between libraries ld offers a special crutch: --start-group/--end-group. You’d have to put these libraries in a group to tell ld to keep searching over and over until all the symbols are resolved.

Though it’s possible to put all the libraries in one big group, man pages claim that it might result it a performance penalty. So it’s always best to pass libraries in the right order not to slow down the already slow linking process even further.

On a large project the amount of libraries could be overwhelming and sorting them by hand might be really annoying. So let’s try to give ld a helping hand and presort the libraries with a bit of help from the graph theory.

Using the nm tool it’s possible to list symbols that are external to a given static library. Those are the symbols that the library needs or imports. It’s also possible to list the symbols the library provides for others to use or exports.

$ nm --portability --extern-only --defined-only lib.a
$ nm --portability --extern-only --undefined-only lib.a

Using these two bits of information it’s possible to figure out the dependencies between the libraries. To do that we need to walk all the pairs of libraries A and B and build the adjacency matrix. The library A depends on the library B when the list of A’s imports has a non-empty intersection with the list of B’s exports. When A depends on B we record an edge form A to B.

Once we have the adjacency matrix built we can transform it into a graph. If the graph is acyclic it could be easily sorted topologically to arrange the libraries in the order that would make ld happy.

In the case of a cyclic graph there’s still a bit of work to be done. The simple solution would be to wrap all the libraries in --start-group and --end-group. It’s gonna work, sure, but it’s not very efficient. There might a hundred libraries and only two with circular dependencies. So the optimal solution would be to sort all the libraries in the topological order and only wrap those two in a group.

To do that we need to find the dependency loops or strongly connected components inside our graph. Those components or groups of libraries could be collapsed to single graph vertices which now represent groups of libraries rather single ones. The new graph with the collapsed loops now should be an acyclic one and could be sorted topologically like in the case above.

It sound like a lot of work to get this implemented. Fear not! With the right set of tools this becomes just a few lines of code. Ruby has a nice gem to work with graphs. It already contains all the building blocks mentioned above. All we need to do is just stitch them together in the right order. and enjoy the results (complete source code).

Shouldn’t ld do something like this (perhaps more efficient version of this) internally?!