This code, when compiled for the x86_64-unknown-linux-musl
target, produces a .got
section:
fn main() {
println!("Hello, world!");
}
$ cargo build --release --target x86_64-unknown-linux-musl
$ readelf -S hello
There are 30 section headers, starting at offset 0x26dc08:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[12] .got PROGBITS 0000000000637b58 00037b58
00000000000004a8 0000000000000008 WA 0 0 8
...
According to this answer for analogous C code, the .got
section is an artifact that can be safely removed. However, it segfaults for me:
$ objcopy -R.got hello hello_no_got
$ ./hello_no_got
[1] 3131 segmentation fault (core dumped) ./hello_no_got
Looking at the disassembly, I see that the GOT basically holds static function addresses:
$ objdump -d hello -M intel
...
0000000000400340 <_ZN5hello4main17h5d434a6e08b2e3b8E>:
...
40037c: ff 15 26 7a 23 00 call QWORD PTR [rip+0x237a26] # 637da8 <_GLOBAL_OFFSET_TABLE_+0x250>
...
$ objdump -s -j .got hello | grep 637da8
637da8 50434000 00000000 b0854000 00000000 PC@.......@.....
$ objdump -d hello -M intel | grep 404350
0000000000404350 <_ZN3std2io5stdio6_print17h522bda9f206d7fddE>:
404350: 41 57 push r15
The number 404350
comes from 50434000 00000000
, which is a little-endian 0x00000000000404350
(this was not obvious; I had to run the binary under GDB to figure this out!)
This is perplexing, since Wikipedia says that
[GOT] is used by executed programs to find during runtime addresses of global variables, unknown in compile time. The global offset table is updated in process bootstrap by the dynamic linker.
- Why is the GOT present? From the disassembly, it looks like the compiler knows all the needed addresses. As far as I know, there is no bootstrap done by the dynamic linker: there is neither
INTERP
norDYNAMIC
program headers present in my binary; - Why does the GOT store function pointers? Wikipedia says the GOT is only for global variables, and function pointers should be contained in the PLT.
TL;DR summary: the GOT is really a rudimentary build artifact, which I was able to get rid of via simple machine code manipulations.
Breakdown
If we look at
and search for
GLOBAL
, we see only four distinct types of references to the GOT (constants differ):All of these are reading instructions, which means that the GOT is not modified at runtime. This in turn means that we can statically resolve the addresses that the GOT refers to! Let's consider the reference types one by one:
call QWORD PTR [rip+0x2126be]
simply says "go to address[rip+0x2126be]
, take 8 bytes from there, interpret them as a function address and call the function". We can simply replace this instruction with a direct call:Notice the
nop
at the end: we need to replace all the 6 bytes of the machine code that constitute the first instruction, but the instruction we replace it with is only 5 bytes, so we need to pad it. Fundamentally, as we are patching a compiled binary, we can replace an instruction with a another one only if it is not longer.jmp QWORD PTR [rip+0x21265f]
is the same as the previous one, but instead of calling an address it jumps to it. This turns into:cmp rbx,QWORD PTR [rip+0x21a5b4]
- this takes 8 bytes from[rip+0x21a5b4]
and compares them to the contents ofrbx
register. This one is tricky, sincecmp
can not compare register contents to an 64-bit immediate value. We could use another register for that, but we don't know which of the registers are used around this instruction. A careful solution would be something likeBut that would be way beyond our limit of 7 bytes. The real solution stems from an observation that the GOT contains only addresses; our address space is (roughly) contained in range [0x400000; 0x650000], which can be seen in the program headers:
It follows that we can (mostly) get away with only comparing 4 bytes of a GOT entry instead of 8. So the substitution is:
objdump
output, since 8 bytes do not fit in one line:It just compares 8 bytes of the GOT to a constant (in this case, 0x0). In fact, we can do the comparison statically; if the operands compare equal, we replace the comparison with
Obviously, a register is always equal to itself. A lot of padding needed here!
If the left operand is greater than the right one, we replace the comparison with
In practice,
rsp
is always greater than zero.If the left operand is smaller than the right one, things get a bit more complicated, but since we have a whole lot of bytes (8!) we can manage:
Notice that the second and the third instructions use
eax
instead ofrax
, sincecmp
andxor
involvingeax
take one less byte than withrax
.Testing
I have written a Python script to do all these substitutions automatically (it's a bit hacky and relies on parsing of
objdump
output though):Now we can get rid of the GOT for real:
I have also tested it on my ~3k LOC app, and it seems to work alright.
P.S. I am not an expert in assembly, so some of the above might be inaccurate.