What exactly is in the .o / .a / .so file?

I was wondering what exactly is stored in the .o or .so file that results from compiling a C ++ program. This post gives a nice overview of the compilation process and the function of the .o file in it, and as far as I understand from this post , .a and .so are just a few .o files concatenated into one file that is linked statically (.a) or in a dynamic (.so) manner.

But I wanted to check if I understand correctly what is stored in such a file. After compiling the following code

void f();
void f2(int);

const int X = 25;

void g() {
  f();
  f2(X);
}

void h() {
  g();
}

      

I would expect to find the following items in the .o file:

  • Machine code for g()

    , containing some placeholder addresses where f()

    and are called f2(int)

    .
  • Machine code for h()

    , no placeholders
  • Machine code for X

    that will just be a number25

  • Some table that indicates at which addresses in the file the symbols can be found g()

    , h()

    andX

  • Another table that indicates which placeholders were used to indicate undefined characters f()

    and f2(int)

    which should be allowed when linking.

Then a program like this nm

would list all the symbol names from both tables.

I suppose the compiler could have optimized the call f2(X)

by calling instead f2(25)

, but it still needs to store the X in the .o file, since there is no way to know if it will be used from another .o file.

It will be right? Is it the same for .a and .so files?

Thank you for your help!

+3


source to share


2 answers


You are pretty much right on the general idea for object files. In the "table that indicates at what addresses in the file" I would replace "addresses" with "offsets", but this is just a wording.

.a files are just archives (old format that predates tar but does the same thing). You can replace .a files with tar files if you taught the linker to unpack them and just reference all the .o files they contain (more or less, there is a bit more logic to not link with object files in an archive that is not needed. but that's just optimization).

.so files are different. They are closer to the final binary than to the object. A .so file with all characters allowed can, at least in theory, be run as a program. In fact, with PIE (position independent executables) the difference between a shared library and a program (at least in theory) is a few bits in the header. They contain instructions for the dynamic linker how to load the library (more or less the same instructions as a regular program) and a move table containing instructions telling the dynamic linker how to resolve external symbols (again, the same in the program). All unresolved symbols in the dynamic link library (and program) are accessible via indirection tables, which are populated at dynamic link time (program start ordlopen

).



If we simplify this, the difference between objects and shared libraries is that a lot more work has been done in the shared library to avoid doing text wrapping (this is not strictly necessary or enforced, but it is a general idea). This means that in object files, the assembler only generated placeholders for addresses, which the linker then fills in, for a shared library, addresses are filled with addresses for jump tables so that the library text does not need to be changed, only a table with a limited jump.

Btw. I'm talking ELF. The older formats had more differences between programs and libraries.

+5


source


What you described in your question (the machine code for functions, initialization data, and move tables) exactly matches what is inside the .o (object) and .so (shared object) files.

.a (archives) are basically multiple .o (object) files grouped together for easy linking during linking. ("Link Libraries")

Files

.so (shared object) includes some additional metadata like what other .so should be linked. (xyz.so may refer to some of the functions that are in abc.so, and the information that would be abc.so needs to be linked, plus optionally the path where abc.so (RPATH) is found needs to be encoded in xyz.so.)



Windows.dll (dynamic link library) files are basically shared objects (.so) with a different name.

Disclaimer: This makes it much easier, but close enough to "The Truth (tm)" for the day to day work of developers.

+1


source







All Articles