how to decompose elf file size into different size of sections or symbols?

62 Views Asked by At

I want to know each symbol's size in elf executable or dynamic library and I assume the total symble size and other stuff's size can add up to the file size.

From size command I can see all section size, but they don't add up to the file size, and I need to know what's missing here.

file libfoo.so size: -rwxr-x---. 1 root root 20080 Dec 2 11:32 libfoo.so

size -A -d libfoo.so

.note.gnu.build-id        36     728
.gnu.hash                 36     768
.dynsym                  192     808
.dynstr                  233    1000
.gnu.version              16    1234
.gnu.version_r            32    1256
.rela.dyn                168    1288
.rela.plt                 72    1456
.init                     27    4096
.plt                      64    4128
.text                    232    4192
.fini                     13    4424
.rodata                   16    8192
.eh_frame_hdr             28    8208
.eh_frame                100    8240
.init_array                8   15784
.fini_array                8   15792
.data.rel.ro               8   15800
.dynamic                 544   15808
.got                      32   16352
.got.plt                  48   16384
.bss                       8   16432
.comment                  46       0
.GCC.command.line        101       0
.gnu.build.attributes    288   24632
.debug_aranges            48       0
.debug_info             1744       0
.debug_abbrev            388       0
.debug_line              161       0
.debug_str               873       0
.debug_line_str          348       0
Total                   5966

After figuring out the complete sections, I can analyze main sections such as .text, .data with nm

1

There are 1 best solutions below

1
On

You're headline question is:

how to decompose elf file size into different size of sections or symbols?

You tried something and it didn't work, so your knock-on question is:

From size command I can see all section size, but they don't add up to the file size, and I need to know what's missing here.

The short answer to the headline question is that you can't decompose an ELF file size into the sizes of the sections or symbols, because:-

  • There's more to an ELF file than sections and symbols (and the symbols are in sections anyway).

  • Most frequently, the size of a section as specified in an ELF file is the size of the section's runtime memory requirement. In that case the section's size may be less than, equal to or greater than the number of bytes of file storage used for the section.

Those points will be fleshed out as we address the knock-on question.

What's missing? (1)

Some of what's missing is the stuff that size -A shouldn't count, because it isn't sections.

An ELF executable or shared library does not consist simply of sections. In addition to sections it contains:-

  • An ELF File Header, a structure defining global properties of the file.

  • A Program Header Table, an array of Program Header structures each of which defines the properties of one memory segment into which one or more sections are mapped.

  • A Section Header Table, an array of Section Header structures each of which defines the properties of one linkage section of the file.

  • The file may also contain padding bytes that are not part of any section or header.

The file, program and section header structures are respectively defined in <elf.h> by ElfN_Ehdr, (Elf32_Phdr| Elf64_Phdr) and (Elf32_Shdr| Elf64_Shdr). These structures are laid out in the file on boundaries that satisfy their natural alignment as per __alignof__(ElfN_Ehdr) etc.

What's missing? (2)

The rest of what's missing is the sections that size -A should count, but doesn't.

The size command, like objdump, is not a wholly dependable parser of modern ELF files. They do not consider all the sections that may actually exist in an input ELF, and size -A makes the same omissions as objdump -h. This is because both of these utilities (and others in binutils) rely on libbfd, the GNU Binary File Descriptor Library, to parse ELF files, and libbfd is not a wholly dependable parser of ELF files1. You can count on size or objumpto recognize sections in a ELF binary that will be memory-mapped. (At least I haven't yet seen the contrary). But for sections that have no memory footprint and just support the work of tools, your mileage may vary. Moral: To investigate ELF binaries, prefer readelf over libbfd-based tools.

And in any case...

The size of a section is not the measure of its file storage.

A section has an alignment property: either no alignment or 2,4,8 ... byte alignment. A segment also has an alignment property, which may again be none, 2,4,8 ... or page-alignment for loadable segments. In the file layout, a section N will occupy a region that is padded as necessary to make section N + 1 begin at a correctly aligned byte. That alignment contraint may stem either from the alignment of section N + 1 within the same segment or from the alignment of the next segment, in which section N + 1 comes first. Since some segments are page-aligned, this means that the region occupied by a section might be padded with up to ($ getconf PAGE_SIZE) bytes - typically 4K.

The size of a section may even be larger than the size of the file region it occupies, or indeed larger than the whole file. The section header defining a section may specify that its size = N, that memory is to be allocated for it at runtime, and that it uses 0 bytes on disk. The .bss section - for uninitialized static symbols - is like that: static char arr[1024] contributes 1K to the size of .bss but nothing to the size of files into which it is compiled.

Optional reading: How to account for the size of a shared library using readelf, and how size -A fails

Let's make a shared library libfoo.so:

$ cat foo.cpp
#include <iostream>
#include <cstring>

void foo()
{
    static char a[50000];
    std::strcpy(a,__func__);
    std::cout << a << std::endl;
}
$ g++ -shared -fPIC -Wall -Wextra -pedantic -o libfoo.so foo.cpp

This example is contrived to have large amount of uninitialized static data.

Here's the size of libfoo.so:

$ du -b libfoo.so 
16112   libfoo.so

Let's see how size -A fares:

$ size -A libfoo.so 
libfoo.so  :
section               size    addr
.note.gnu.property      32     680
.note.gnu.build-id      36     712
.gnu.hash               36     752
.dynsym                264     792
.dynstr                306    1056
.gnu.version            22    1362
.gnu.version_r          48    1384
.rela.dyn              216    1432
.rela.plt               48    1648
.init                   27    4096
.plt                    48    4128
.plt.got                16    4176
.plt.sec                32    4192
.text                  249    4224
.fini                   13    4476
.rodata                  3    8192
.eh_frame_hdr           44    8196
.eh_frame              148    8240
.init_array              8   15848
.fini_array              8   15856
.dynamic               448   15864
.got                    48   16312
.got.plt                40   16360
.data                    8   16400
.bss                 50032   16416
.comment                37       0
Total                52217

At 52217 bytes, size -A makes the sum of the sections more than 3 times the size of the file. So it should be, because of those 50032 bytes assigned to bss that take no space in the file. Let's subtract them then. 52217 - 50032 = 2185. But that is 16112 - 2185 = 13927 bytes smaller than the actual file. This is your puzzle.

Note that every section listed is memory-mapped (it has a non-0 addr), except the .comment section. How many sections are listed? -

$ $ size -A libfoo.so | grep '^\.' | wc -l
$ 26

Turning to readelf

Here's the ELF file header of libfoo.so:

$ readelf -h libfoo.so 
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          14192 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         11
  Size of section headers:           64 (bytes)
  Number of section headers:         30
  Section header string table index: 29
  

This tells us that:

  • The size of the file header itself is 64 bytes.
  • There are 11 program headers of 56 bytes each: total 616 bytes.
  • There are 30 section headers of 64 bytes each: total 1920 bytes.
  • The program headers start at byte 64. That's right after the file header. So they continue until byte 64 + 616 = 680.
  • The section headers start at byte 14192; so they continue until byte 14192 + 1920 = 16112. That's the size of the file, so the section headers go all the way to the end.

Sandwiched between the program headers and the section headers we have 14192 - 680 = 13512 bytes still to account for. That should be for the sections.

Here are the program header details:

$ readelf -l libfoo.so

Elf file type is DYN (Shared object file)
Entry point 0x0
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000006a0 0x00000000000006a0  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x0000000000000189 0x0000000000000189  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x00000000000000c4 0x00000000000000c4  R      0x1000
  LOAD           0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000230 0x000000000000c5a8  RW     0x1000
  DYNAMIC        0x0000000000002df8 0x0000000000003df8 0x0000000000003df8
                 0x00000000000001c0 0x00000000000001c0  RW     0x8
  NOTE           0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                 0x0000000000000020 0x0000000000000020  R      0x8
  NOTE           0x00000000000002c8 0x00000000000002c8 0x00000000000002c8
                 0x0000000000000024 0x0000000000000024  R      0x4
  GNU_PROPERTY   0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_EH_FRAME   0x0000000000002004 0x0000000000002004 0x0000000000002004
                 0x000000000000002c 0x000000000000002c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002de8 0x0000000000003de8 0x0000000000003de8
                 0x0000000000000218 0x0000000000000218  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.property .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 
   01     .init .plt .plt.got .plt.sec .text .fini 
   02     .rodata .eh_frame_hdr .eh_frame 
   03     .init_array .fini_array .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.gnu.property 
   06     .note.gnu.build-id 
   07     .note.gnu.property 
   08     .eh_frame_hdr 
   09     
   10     .init_array .fini_array .dynamic .got

We don't need to make much of these details, but they serve to explain some of the padding we'll find later amongst the sections.

And here are the section details:

$ readelf -tW libfoo.so
There are 30 section headers, starting at offset 0x3770:

Section Headers:
  [Nr] Name
       Type            Address          Off    Size   ES   Lk Inf Al
       Flags
  [ 0] 
       NULL            0000000000000000 000000 000000 00   0   0  0
       [0000000000000000]: 
  [ 1] .note.gnu.property
       NOTE            00000000000002a8 0002a8 000020 00   0   0  8
       [0000000000000002]: ALLOC
  [ 2] .note.gnu.build-id
       NOTE            00000000000002c8 0002c8 000024 00   0   0  4
       [0000000000000002]: ALLOC
  [ 3] .gnu.hash
       GNU_HASH        00000000000002f0 0002f0 000024 00   4   0  8
       [0000000000000002]: ALLOC
  [ 4] .dynsym
       DYNSYM          0000000000000318 000318 000108 18   5   1  8
       [0000000000000002]: ALLOC
  [ 5] .dynstr
       STRTAB          0000000000000420 000420 000132 00   0   0  1
       [0000000000000002]: ALLOC
  [ 6] .gnu.version
       VERSYM          0000000000000552 000552 000016 02   4   0  2
       [0000000000000002]: ALLOC
  [ 7] .gnu.version_r
       VERNEED         0000000000000568 000568 000030 00   5   1  8
       [0000000000000002]: ALLOC
  [ 8] .rela.dyn
       RELA            0000000000000598 000598 0000d8 18   4   0  8
       [0000000000000002]: ALLOC
  [ 9] .rela.plt
       RELA            0000000000000670 000670 000030 18   4  23  8
       [0000000000000042]: ALLOC, INFO LINK
  [10] .init
       PROGBITS        0000000000001000 001000 00001b 00   0   0  4
       [0000000000000006]: ALLOC, EXEC
  [11] .plt
       PROGBITS        0000000000001020 001020 000030 10   0   0 16
       [0000000000000006]: ALLOC, EXEC
  [12] .plt.got
       PROGBITS        0000000000001050 001050 000010 10   0   0 16
       [0000000000000006]: ALLOC, EXEC
  [13] .plt.sec
       PROGBITS        0000000000001060 001060 000020 10   0   0 16
       [0000000000000006]: ALLOC, EXEC
  [14] .text
       PROGBITS        0000000000001080 001080 0000f9 00   0   0 16
       [0000000000000006]: ALLOC, EXEC
  [15] .fini
       PROGBITS        000000000000117c 00117c 00000d 00   0   0  4
       [0000000000000006]: ALLOC, EXEC
  [16] .rodata
       PROGBITS        0000000000002000 002000 000003 00   0   0  1
       [0000000000000002]: ALLOC
  [17] .eh_frame_hdr
       PROGBITS        0000000000002004 002004 00002c 00   0   0  4
       [0000000000000002]: ALLOC
  [18] .eh_frame
       PROGBITS        0000000000002030 002030 000094 00   0   0  8
       [0000000000000002]: ALLOC
  [19] .init_array
       INIT_ARRAY      0000000000003de8 002de8 000008 08   0   0  8
       [0000000000000003]: WRITE, ALLOC
  [20] .fini_array
       FINI_ARRAY      0000000000003df0 002df0 000008 08   0   0  8
       [0000000000000003]: WRITE, ALLOC
  [21] .dynamic
       DYNAMIC         0000000000003df8 002df8 0001c0 10   5   0  8
       [0000000000000003]: WRITE, ALLOC
  [22] .got
       PROGBITS        0000000000003fb8 002fb8 000030 08   0   0  8
       [0000000000000003]: WRITE, ALLOC
  [23] .got.plt
       PROGBITS        0000000000003fe8 002fe8 000028 08   0   0  8
       [0000000000000003]: WRITE, ALLOC
  [24] .data
       PROGBITS        0000000000004010 003010 000008 00   0   0  8
       [0000000000000003]: WRITE, ALLOC
  [25] .bss
       NOBITS          0000000000004020 003018 00c370 00   0   0 32
       [0000000000000003]: WRITE, ALLOC
  [26] .comment
       PROGBITS        0000000000000000 003018 000025 01   0   0  1
       [0000000000000030]: MERGE, STRINGS
  [27] .symtab
       SYMTAB          0000000000000000 003040 000330 18  28  24  8
       [0000000000000000]: 
  [28] .strtab
       STRTAB          0000000000000000 003370 0002ed 00   0   0  1
       [0000000000000000]: 
  [29] .shstrtab
       STRTAB          0000000000000000 00365d 00010d 00   0   0  1
       [0000000000000000]: 

Note that readelf lists 30 sections, not the 26 reported by size. Per the section headers, section 0 is of type NULL - no name, no address, no size. This is mandatory. Reasonably enough, the null section was ignored by size -A. Section 1, .note.gnu.property is the first non-null section and was reported by size. The final sections 27 - 29, .symtab, .strtab and .shstrtab were also ignored by size. They have no memory footprint - Address = 0 - but they do occupy respectively 0x330, 0x2ed and 0x10d bytes in the file. That's 1834 bytes of non-null sections that size -A disregarded.

Section 1 starts at offset 0x2a8 = byte 680 = end of program headers, so the sections start right after the program headers.

The last section, 29, starts at offset 0x365d = byte 13917. It is 0x10d = 269 bytes in size, so it continues until byte 13917 + 269 = 14186. Then 6 bytes of padding takes us to byte 14192 = start of section headers, which requires 8-byte alignment (__alignof__(Elf64_Shdr) == 8 on my machine). It's easily observed that various other sections are followed by some padding. That's either to satisfy the alignment requirement of the next section, as specified in the Al column - e.g. the .rodata section - or else to let the next one be aligned at the start of the next memory segment, per the program header details (Section to Segment mapping) - e.g. the .fini section. That .fini section starts at offset 0x117c and has size 0xd. The next section .rodata is at page-aligned offset 0x2000; so there's 3703 bytes of padding in .fini's chunk of the file.

Note the .bss section. It starts at offset 0x3018 and has size 0xc370 = 50032. That's our static char a[50000];. It has section type NOBITS, meaning no file data, and accordingly the next section, .comment, also starts at 0x3018.

So, the 16112 bytes of libfoo decompose into its ELF components as follow:

  • the File Header, then with no padding
  • the table of 11 Program Headers, then with no padding
  • the 28 sections that have non-0 length, some with padding, then
  • the table of 30 section headers.

Further Reading


  1. libbfd "is a package which allows applications to use the same routines to operate on object files whatever the object file format. A new object file format can be supported simply by creating a new BFD back end and adding it to the library.". A weakness of this architecture is that the BFD front-end embodies an abstraction of all the supported back-end formats that elides ill-fitting features some of them.