Use raw binary blob of instructions in C code

1k Views Asked by At

I'm working with jit compilation by TCC (Tiny C Compiler) but it have a limited support for assembly and I get frequently stuck by this... I would like to know if there is some kind of trick to insert raw instructions into inline assembly? Such as:

mov -0x18(%rbp), %rax
finit
flds (%rax)

/* Custom unsupported binary instructions here */

flds (%rcx)

I know it won't be an easy maintainable thing, but I would like to keep TCC unmodified.

2

There are 2 best solutions below

0
On BEST ANSWER

If it supports standard GAS / unix-assembler directives like .byte 0x00, 0x12, you can emit any byte sequence you want. (Also .word or .long if you want to use write a 16 or 32-bit immediate as a single 32-bit number.)

GNU as manual

5
On

GNU as supports two nice ways to work with raw instructions. There are .byte, .short, .long and similar directives that can be used to directly emit the raw numbers in the output.

Another way (only implemented for a few architectures) is the .insn directive that allows for a middle way between ordinary assembly instructions and the raw numbers mentioned above. See for example the documentation for RISC-V .insn formats.

To use either of them within C you can use the (non-standard) asm statement (or variations of it). The following example uses GCC's naked function attribute (also only available on some architectures) to prevent any additional instructions from being emitted by the compiler (e.g., function prologue/epilogue), which can easily trash registers.

This example is for RISC-V and shows standard 32-bit instructions as well as compressed 16-bit instructions:

void __attribute__ ((naked)) rawinst (void) {
  __asm__ volatile ("li t5,1");
  __asm__ volatile (".short 0x4f05"); // li t5,1
  __asm__ volatile ("lui t5, 0x1c000");
  __asm__ volatile (".word 0x1c000f37"); // lui t5, 0x1c000
  __asm__ volatile (".insn r 0x33, 0, 0, a0, a1, a2"); // add a0, a1, a2
  __asm__ volatile ("ret");
}

If you want this inside a non-naked function, use GNU C Extended asm with constraints / clobbers to tell the compiler what your asm does and where the inputs/outputs are. And of course don't use your own ret. At least with gcc/clang; if you're actually using TCC, it doesn't optimize at all between C statements so it should be safe to us Basic asm statements.