GCC/LD position-independent code with instruction-relative data access

41 Views Asked by At

Motivation

Suppose I had:

int       some_bss_values[8];
int       some_data_values[] = {1,2,3,4,5,6,7,8};
int const some_rodata_values[] = {9,10,11,12,13,14,15,16};

//Some silly code that shows usage of bss, data, and rodata
int some_function(int x) {
   some_bss_values[x] = some_rodata_values[x];
   return some_data_values[x]++;
}

My ultimate goal is to compile this to a binary blob that I could load at runtime (this is an embedded system, so no dynamic linkers or even ELF loaders). Specifically, I want to be able to load this blob, data included, at any address and jump to some_function.

What I tried

I tried writing a simple linker script:

SECTIONS {
    .text : {
        *(.text);
    }
    .data : { 
        *(.bss); /*Placed here because I explicitly want the BSS to be part of the image*/
        *(.data); 
    } =0
    .rodata : { *(.rodata); }
}

I compiled the example code to an ELF with:

# Using -g and -O0 so we can have a readable disassembly. I'm actually using
# a cross-compiler but we'll use regular gcc with x86 for the sake of the question

gcc -Wl,-esome_function -o t.elf -fPIC -nostdlib -T my_linker_script.ld -g -O0 my_code.cpp

Then I generated an image with:

objcopy -Obinary -j.text -j.data -j.rodata t.elf t.bin

The problem

The above commands produce a binary blob for me to use, including the explicit zeroes for the BSS, but looking at the disassembly highlights a problem:

objdump -sSxC t.elf

...
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .note.gnu.build-id 00000024  0000000000000000  0000000000000000  00200000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         00000053  0000000000000024  0000000000000024  00200024  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .data         00000040  0000000000000080  0000000000000080  00200080  2**5
                  CONTENTS, ALLOC, LOAD, DATA
...

Disassembly of section .text:

0000000000000024 <some_function(int)>:
int       some_bss_values[8];
int       some_data_values[] = {1,2,3,4,5,6,7,8};
int const some_rodata_values[] = {9,10,11,12,13,14,15,16};

//Some silly code that shows usage of bss, data, and rodata
int some_function(int x) {
  24:   55                      push   %rbp
  25:   48 89 e5                mov    %rsp,%rbp
  28:   89 7d fc                mov    %edi,-0x4(%rbp)
   some_bss_values[x] = some_rodata_values[x];
  2b:   8b 45 fc                mov    -0x4(%rbp),%eax
  2e:   48 98                   cltq   
  30:   48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
  37:   00 
  38:   48 8d 05 a1 00 00 00    lea    0xa1(%rip),%rax        # e0 <some_rodata_values>
  3f:   8b 0c 02                mov    (%rdx,%rax,1),%ecx
  42:   48 c7 c0 80 00 00 00    mov    $0x80,%rax
  49:   8b 55 fc                mov    -0x4(%rbp),%edx
  4c:   48 63 d2                movslq %edx,%rdx
  4f:   89 0c 90                mov    %ecx,(%rax,%rdx,4)
   return some_data_values[x]++;
  52:   48 c7 c0 a0 00 00 00    mov    $0xa0,%rax
  59:   8b 55 fc                mov    -0x4(%rbp),%edx
  5c:   48 63 d2                movslq %edx,%rdx
  5f:   8b 04 90                mov    (%rax,%rdx,4),%eax
  62:   8d 70 01                lea    0x1(%rax),%esi
  65:   48 c7 c2 a0 00 00 00    mov    $0xa0,%rdx
  6c:   8b 4d fc                mov    -0x4(%rbp),%ecx
  6f:   48 63 c9                movslq %ecx,%rcx
  72:   89 34 8a                mov    %esi,(%rdx,%rcx,4)
}
  75:   5d                      pop    %rbp
  76:   c3                      retq  

Here we see that reading from rodata correctly uses instruction-relative addressing. However, it seems to be using a hardcoded address of 0 for the BSS, and a hardcoded address of 0xA0 for the data section.

How can I instruct gcc/ld to use instruction-relative addressing for data in the BSS and .data sections?

0

There are 0 best solutions below