What exactly is in a .o / .a / .so file?

4.6k Views Asked by At

I was wondering what exactly is stored in a .o or a .so file that results from compiling a C++ program. This post gives a quite good overview of the compilation process and the function of a .o file in it, and as far as I understand from this post, .a and .so files are just multiple .o files merged into a single file that is linked in a static (.a) or dynamic (.so) way.

But I wanted to check if I understand correctly what is stored in such a file. After compiling the following code

void f();
void f2(int);

const int X = 25;

void g() {
  f();
  f2(X);
}

void h() {
  g();
}

I would expect to find the following items in the .o file:

  • Machine code for g(), containing some placeholder addresses where f() and f2(int) are called.
  • Machine code for h(), with no placeholders
  • Machine code for X, which would be just the number 25
  • Some kind of table that specifies at which addresses in the file the symbols g(), h() and X can be found
  • Another table that specifies which placeholders were used to refer to the undefined symbols f() and f2(int), which have to be resolved during linking.

Then a program like nm would list all the symbol names from both tables.

I suppose that the compiler could optimize the call f2(X) by calling f2(25) instead, but it would still need to keep the symbol X in the .o file since there is no way to know if it will be used from a different .o file.

Would that be about correct? Is it the same for .a and .so files?

Thanks for your help!

2

There are 2 best solutions below

7
On BEST ANSWER

You're pretty much correct in the general idea for object files. In the "table that specifies at which addresses in the file" I would replace "addresses" with "offsets", but that's just wording.

.a files are simply just archives (an old format that predates tar, but does the same thing). You could replace .a files with tar files as long as you taught the linker to unpack them and just link with all the .o files contained in them (more or less, there's a little bit more logic to not link with object files in the archive that aren't necessary, but that's just an optimization).

.so files are different. They are closer to a final binary than an object file. An .so file with all symbols resolved can at least theoretically be run as a program. In fact, with PIE (position independent executables) the difference between a shared library and a program are (at least in theory) just a few bits in the header. They contain instructions for the dynamic linker how to load the library (more or less the same instructions as a normal program) and a relocation table that contains instructions telling the dynamic linker how to resolve the external symbols (again, the same in a program). All unresolved symbols in a dynamic library (and a program) are accessed through indirection tables which get populated at dynamic linking time (program start or dlopen).

If we simplify this a lot, the difference between objects and shared libraries is that much more work has been done in the shared library to not do text relocation (this is not strictly necessary and enforced, but it's the general idea). This means that in object files the assembler has only generated placeholders for addresses which the linker then fills in, for a shared library the addresses are filled in with addresses to jump tables so that the text of the library doesn't need to get changed, only a limited jump table.

Btw. I'm talking ELF. Older formats had more differences between programs and libraries.

0
On

What you described in your question (machine code for functions, initialization data and relocation tables) is pretty much exactly what is inside .o (object) and .so (shared object) files.

.a (archives) are basically multiple .o (object) files bunched together for easier reference during linking. ("Link libraries")

.so (shared object) files include some additional metadata, like which other .so's would need to be linked in. (xyz.so might reference some functions that reside in abc.so, and the information that abc.so would need to be linked in, plus optionally the path where to find abc.so (the RPATH), need to be encoded in xyz.so.)

Windows .dll (dynamic link library) files are basically shared objects (.so) with a different name.

Disclaimer: This is simplifying things significantly, but is close enough to "The Truth (tm)" to serve for everyday developer needs.