Store and retrieve version information in ELF file

4.7k Views Asked by At

I'm trying to figure out a good way to store and retrieve version information in C / C++ executables and libraries on Linux. I'm using the GCC compiler for my C and C++ programs.

The storage part is pretty straightforward; declaring a variable like this stores it in the .rodata section of the output file:

const char MY_VERSION[] = "some_version_information";

However, I'm having an incredibly difficult time with retrieving the information from an external program. With shared libraries, it is fairly easy to use dlopen and dlsym to load a library and look up a symbol, but this might not be the best way to do it, and it won't work at all for executables. Also, if possible, I would like this to work with executables and libraries built for a different architecture.

I figure that since shared libraries and executables both use the ELF format, it makes sense to use a library that knows how to read ELF files. The two I was able to find are libelf and BFD; I'm struggling to find decent documentation for each. Is there perhaps a better library to use?

This is what I have so far, using BFD:

#include <stdio.h>                                                                                                                                                                                                               [6/1356]
#include <string.h>
#include <bfd.h>

int main(int argc, char* argv[])
{
    const char *filename;
    int i;
    size_t storage;
    bfd *b = NULL;
    asymbol **symbol_table;
    long num_symbols;

    if(argc != 2) return 1; // todo: print a useful message
    else filename = argv[1];

    b = bfd_openr(filename, NULL);

    if(b == NULL){
        fprintf(stderr, "Error: failed to open %s\n", filename);
        return 1;
    }

    // make sure we're opening a file that BFD understands
    if(!bfd_check_format(b, bfd_object)){
        fprintf(stderr, "Error: unrecognized format\n");
        return 1;
    }

    // how much memory is needed to store the symbol table
    storage = bfd_get_symtab_upper_bound(b);

    if(storage < 0){
        fprintf(stderr, "Error: unable to find storage bound of symbol table\n");
        return 1;
    } else if((symbol_table = malloc(storage)) == NULL){
        fprintf(stderr, "Error: failed to allocate memory for symbol table\n");
        return 1;
    } else {
        num_symbols = bfd_canonicalize_symtab(b, symbol_table);
    }

    for(i = 0; i < num_symbols; i++){
        if(strcmp(symbol_table[i]->name, "MY_VERSION") == 0){
            fprintf(stderr, "found MY_VERSION\n");

            // todo: print the string?
        }
    }

    return 0;
}

I realize that printing the string may not be very simple due to the ELF format.

Is there a straightforward way to print a string symbol that is stored in an ELF file?

3

There are 3 best solutions below

5
On BEST ANSWER

I figured out that I could use a custom section to store the version information, and then just dump the section to 'extract' the string.

Here's how the version information should be declared:

__attribute__((section("my_custom_version_info"))) const char MY_VERSION[] = "some.version.string";

Then, in the program using BFD, we can get the section a few different ways. We can use bfd_get_section_by_name:

asection *section = bfd_get_section_by_name(b, "my_custom_version_info");

Now that we have a handle to the section, we can load it into memory. I chose to use bfd_malloc_and_get_section (you should make sure section isn't NULL first):

bfd_byte *buf;
if(!bfd_malloc_and_get_section(b, section, &buf)){
    // error: failed to malloc or read the section
}

Now that we have the section loaded into a buffer, we can print its contents:

for(int i = 0; i < section->size && buf[i]; i++){
    printf("%c", buf[i]);
}
printf("\n");

Don't forget to free the buffer.

7
On

From inside your executable, just declare

 extern const char MY_VERSION[];

BTW, for C++ better declare extern "C" that symbol (even in the file defining it).

Then your issue is how to find a symbol MY_VERSION in some external ELF executable (the easy way could be to popen some nm process, see nm(1)). BTW, it is the same as for a function symbol (or for a data one). You could use a library such as libelf or libelfin (or the venerable libbfd) or parse the ELF format yourself (be sure to read first that wikipage)

You should learn and understand the ELF format. You need to read carefully documentation on ELF and on the x86-64 ABI. Explore existing ELF executables with objdump(1) & readelf(1). Read also elf(5). Read how symbol tables are represented, and how their hash code is computed. Of course read in details all the possible relocations. You could read Levine's book on Linkers and Loaders and Drepper's paper on How to Write Shared Libraries (both explain ELF), and also Assembler Language HowTo, and Ian Taylor's paper on gold, and ELF: better symbol lookup via DT_GNU_HASH. See also Solaris documentation e.g. on Hash Table Section and OSDEV ELF tutorial & ELF pages

You don't need any specific section (or segment).

(I've done that about 20 years ago for Sparc; it is not particularly hard)

You could also look into emacs source code, its unexec.c is writing some ELF file

BTW, ELF has some versioning info with symbols, see e.g. dlvsym(3)

You may also want to understand how execve(2) or ld-linux(8) works, what is the virtual address space of a process (see proc(5), try cat /proc/$$/maps)

0
On

The traditional way of doing this is via SCCS what(1) strings. See https://pubs.opengroup.org/onlinepubs/9699919799/utilities/what.html.