0xDEADBEEF

and 0xBADC0FFEE

Help Ld to Sort Static Libraries

| Comments

The GNU linker ld is not very smart when it comes to finding functions in static libraries. It’s a bit weird because that is what linkers are primarily supposed to do.

Let’s say you have a function f in a library libF and a function g in a library libG, and g is calling f. Now if you try to link an executable that uses function g and pass the libraries to ld in the alphabetical order (which is reasonable) like so:

$ gcc main.c libF.a libG.a

you are gonna get an error about missing function f:

libG.a(g.o): In function `g':
g.c:(.text+0xa): undefined reference to `f'
collect2: error: ld returned 1 exit status

Weird, eh? Well, apparently it’s a feature or rather an optimization.

Normally, an archive is searched only once in the order that it is specified on the command line. If af symbol in that archive is needed to resolve an undefined symbol referred to by an object in an archive that appears later on the command line, the linker would not be able to resolve that reference.

To fix the problem you’d have to pass the libraries in a different order, so that libG comes first on the list. For a more complicated case, when there are circular dependencies between libraries ld offers a special crutch: --start-group/--end-group. You’d have to put these libraries in a group to tell ld to keep searching over and over until all the symbols are resolved.

Though it’s possible to put all the libraries in one big group, man pages claim that it might result it a performance penalty. So it’s always best to pass libraries in the right order not to slow down the already slow linking process even further.

On a large project the amount of libraries could be overwhelming and sorting them by hand might be really annoying. So let’s try to give ld a helping hand and presort the libraries with a bit of help from the graph theory.

Using the nm tool it’s possible to list symbols that are external to a given static library. Those are the symbols that the library needs or imports. It’s also possible to list the symbols the library provides for others to use or exports.

$ nm --portability --extern-only --defined-only lib.a
$ nm --portability --extern-only --undefined-only lib.a

Using these two bits of information it’s possible to figure out the dependencies between the libraries. To do that we need to walk all the pairs of libraries A and B and build the adjacency matrix. The library A depends on the library B when the list of A’s imports has a non-empty intersection with the list of B’s exports. When A depends on B we record an edge form A to B.

Once we have the adjacency matrix built we can transform it into a graph. If the graph is acyclic it could be easily sorted topologically to arrange the libraries in the order that would make ld happy.

In the case of a cyclic graph there’s still a bit of work to be done. The simple solution would be to wrap all the libraries in --start-group and --end-group. It’s gonna work, sure, but it’s not very efficient. There might a hundred libraries and only two with circular dependencies. So the optimal solution would be to sort all the libraries in the topological order and only wrap those two in a group.

To do that we need to find the dependency loops or strongly connected components inside our graph. Those components or groups of libraries could be collapsed to single graph vertices which now represent groups of libraries rather single ones. The new graph with the collapsed loops now should be an acyclic one and could be sorted topologically like in the case above.

It sound like a lot of work to get this implemented. Fear not! With the right set of tools this becomes just a few lines of code. Ruby has a nice gem to work with graphs. It already contains all the building blocks mentioned above. All we need to do is just stitch them together in the right order. and enjoy the results (complete source code).

Shouldn’t ld do something like this (perhaps more efficient version of this) internally?!

Reinventing the Std::wheel

| Comments

This little story is about why reinventing the wheel is not only wasteful but could also be dangerous.

On one sunny day I inserted some pretty innocent std::map into a struct, added a couple of one-liners, made sure it does what’s supposed to, committed and pushed. In some minutes I hear a scream. The damn thing crashes on a Linux. Why? It’s dead simple, it surely worked for me, there just cannot be anything wrong with it, it must work. Right?

It just didn’t. After a quick look it was pretty clear — it’s a GCC bug. Perhaps the version is too recent and it has to be crawling with bugs. I started trying other versions but the problem wouldn’t go away. Well, it’s pretty clear now — it’s a GCC family bug! That would be a glorious discovery.

We sit down together and start to minimize the example, trying to reproduce it with fewer lines of code. We are trying this and that, replacing types with simpler ones, making all kinds of changes, renaming classes, moving them to different files, rearranging the lines, etc., etc. Other people come and join us, we are three, four, five and people keep coming. Everyone’s interested, everyone’s got a theory.

There are wild versions of how the mutable keyword or a const_cast can crash the whole thing without even being called. How one should not be using STL for anything since it’s a piece of crap and has never been stable. How C++ is a rotten language and nothing even remotely close to reliable or at least deterministic could be produced with it. Many, many ridiculous theories, just because any normal ones don’t make sense.

The time goes by. It’s already the second day. It’s going to be the third soon. A bunch of people are involved. The air is electrified. We are already down to just a few lines of code. It appears that the destructor of std::map is causing all the trouble. But it’s used everywhere else and everything has been ok so far. In this case though the map is part of the structure that is inserted into a homegrown version of std::vector that is used everywhere else and has been ok so far as well.

We start to get suspicious but nobody yet dives into blasphemy and tries to condemn CVector (let’s call it that way). After all it underpins the whole foundation, it’s used everywhere and has been for years. It’s definitely better and more reliable than std::vector; it’s faster and smarter and it’s written by the cool guys (though, not exactly compatible and sometimes needs duct tape and a bit of voodoo to work). But the line is crossed and we start to poke around the CVector code. And…

There it was. One big shiny memmove in the CVector’s resize function. It seems the person who wrote that class thought there was nothing dangerous in moving some memory from one place to another. We used to do that in C all the time. It’s just a bunch of dumb bytes after all. But C++ is no C and moving an object to another place without telling it (read assigning or copy-constructing) could break it. This what apparently was happening with std::map shipped with GCC.

The problem was fixed in a minute by replacing CVector with std::vector. One minute fix which took a few people many hours to identify and track down. All of which could be avoided in the first place if someone didn’t try to reinvent the wheel driven by the not invented here syndrome.