Confusion around alignment and padding of strings in .rodata

667 Views Asked by At

Taking this simple C program

const char s1[] = "hello",
           s2[] = "there";

and compiling it using gcc -c a.c -O0 -o a.o

yields in .rodata containing the following:

'hello\x00there\x00'

, which is what I expect. Each of the strings occupy 6 bytes, for 12 bytes in total.


However if I change the 2nd string to "there s", like so:

const char s1[] = "hello",
           s2[] = "there s";

, .rodata contains the following:

'hello\x00\x00\x00there s\x00'

An extra 2 null padding bytes were added to the end of s1.

I am assuming that they were added in order to align the first string to an 8byte boundary (seeing as I'm on a 64bit platform) - though I may be wrong?

My question then arises - why wasn't that done in the first example? Why weren't 2 extra padding bytes added to the end of each string to get them to an 8byte boundary?


All examples were conducted on an amd64/linux/gcc machine.

1

There are 1 best solutions below

7
On
Internally, GCC calculates alignment in bits, converting to bytes for printing a .align directive. In this answer I'll use bytes for everything with the corresponding internal alignment in bits between parentheses.

At the beginning, both strings are aligned 1 byte (8 bits internally in gcc). You can see the gimple to be sure.

If you take a look to i386 porting. You will see that DATA_ALIGNMENT is defined as ix86_data_alignment. This function is used by align_variable ( in varasm.c) to align strings bigger than 8 bytes to something between 8 bytes and 32 bytes depending on their size (between 64 and 256 bits internally in gcc).

After that, you can see in assemble_variable (varasm.c) that the ASM_OUTPUT_ALIGN which print the .align is only called if the align is bigger than BITS_PER_UNIT which is 1 byte by default (8 bits internally in gcc).

You can find DATA_ALIGNMENT definition in https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/i386.h
You can find ix86_data_alignment in https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/i386.c
you can find assemble_variable and align_variable in https://github.com/gcc-mirror/gcc/blob/master/gcc/varasm.c

So if you declare a string of a size equal to or greater than 8 bytes it will be aligned. You will see a .align x with x between 8 and 32 bytes depending on the size of the string. As @Peter Cordes said, it will be more visible with the assembly.