Patching arm64 binary to replace all 'call' instructions to point to a specific function

470 Views Asked by At

How do I replace all the function calls in an arm64 binary with call to a specific function. The intent is to 'insert' a indirection such that I can log all function calls.

Example:

mov     x29, sp
mov     w0, #10
bl      bar(int)
...
# Replace "bl bar" with my_func. my_func will now take all the parameters and forward it to foo.

mov     x29, sp
mov     w0, #10
bl      my_func(...)

The replacement function prints the pointer to the function, and then invokes the callee with the provided arguments. I'm also not sure how this forwarding will work for all the cases but the intent is to have something like this:

template<class F, class... Args>
void my_func(F&& f, Args&&... args) {
    printf("calling: %p", f);
    std::invoke(std::forward<F>(f), std::forward<Args>(args));
}

1

There are 1 best solutions below

0
On

TL:DR: write asm wrapper functions that call a C++ void logger(void *fptr) which returns. Don't try to tailcall from C++ because that's not possible in the general case.

An alternate approach might be to "hook" every callee, instead of redirecting at the call site. But then you'd miss calls to functions in libraries you weren't instrumenting.


I don't think C++ lets you forward any/all args without knowing what they are. That's easy to do in asm for a specific calling convention, since the final invocation of the real function can be a tailcall jump, with return address and all arg-passing registers set up how they were, and the stack pointer. But only if you're not trying to remove an arg.

So instead of having C++ do the tailcall to the real function, have asm wrappers just call a logging function. Either printf directly, or a function like extern "C" void log_call(void *fptr); which returns. It's is compiled normally so it'll follow the ABI, so the hand-written asm trampoline / wrapper function knows what it needs to restore before jumping.


Capturing the target address

bl my_func won't put the address of bar anywhere.

For direct calls you could use the return address (in lr) to look up the target, e.g. in a hash table. Otherwise you'd need a separate trampoline for every function you're hooking. (Modifying the code to hook at the target function instead of the call sites wouldn't have this problem, but you'd have to replace the first instruction with a jump somewhere which logs and then returns. And which does whatever that replaced first instruction did. Or replace the first couple instructions with one that saves the return address and then calls.)

But any indirect calls like blr x8 will need a special stub. Probably one trampoline stub for each different possible register that holds a function address.

Those stubs will need to be written in asm.

If you were trying to call a wrapper in C++ the way you imagined, it would be tricky because the real args might be using all the register-arg slots. And changing the stack pointer to add a stack arg makes it a new 5th arg or something weird. So it works much better just to call a C++ function to do the logging, then restore all the arg-passing registers which you saved on the stack. (16 bytes at a time with stp.)

That also avoids the problem of trying to make a transparent function with C++


Removing one arg and forwarding the rest

Your design requires my_func to remove one arg and then forward an unknown number of other args of unknown type to another function. That's not even possible in ARM64 asm, therefore not surprising that C++ doesn't have syntax that would require the compiler to do it.

If the arg was actually a void* or function pointer, it would take one registers, so removing it would move the next 3 regs down (x1 to x0, etc.) and the first stack arg then goes in x3. But the stack has to stay 16-byte aligned, so you can't load just it and leave the later stack args in the right place.

A workaround for that in some cases would be to make that f arg 16 bytes, so it takes two registers. Then you can mov x3,x2 down to x0,x1, and ldp 16 bytes of stack args. Except what if that arg was one that always gets passed in memory, not registers, e.g. part of an even larger object, or non-POD or whatever the criterion for the C++ ABI to make sure it always has an address.

So maybe f could be 32 bytes so it goes on the stack, and can be removed without touching arg-passing registers or needing to pull any stack args back into registers.

Of course in the real case you didn't have a C++ function that can add a new first arg and then pass on all the rest either. That's something you could again only do in special cases, like passing on an f.

It's something you could do in asm on 32-bit x86 with a pure stack-args calling convention and no stack-alignment requirement; you can move the return address up one slot and jump, so you eventually return to the original call-site with the stack pointer restored to how it was before calling the trampoline that added a new first arg and copied the return address lower.

But C++ won't have any constructs that impose requirements on ABIs beyond what C does.


Scanning a binary for bl instructions

That will miss any tailcalls that use b instead of bl. That might be ok, but if not I don't see a way to fix it. Unconditional bl will be all over the place inside functions. (With some heuristics for identifying functions, a b outside the current function can be assumed to be a tailcall, while others aren't, since compilers usually make all the code for a single function contiguous. Except when some blocks go in a .text.cold section if the compiler identifies them as unlikely.)

AArch64 has fixed-width instructions that require alignment, so consistent disassembly of the compiler-generated instructions is easy, unlike x86. So you can identify all the bl instructions.

But if AArch64 compilers mix in any constant data between functions, like 32-bit ARM compilers do (literal pools for PC-relative loads), false positives are possible even if you limit it to looking at parts of the binary that are in executable ELF sections. (Or program segments if section headers have been stripped.)

I don't think bl gets used for anything other than function calls in compiler-generated code. (e.g. not to private helper functions the compiler invented.)

You might want a library to help parse ELF headers and find the right binary offsets. Looking for bl instructions might be something you do by scanning the machine code, not disassembly.


If you're modifying compiler asm output before even assembling, that would make something easier; you could add instructions are callsites. But for existing binaries you can't compile from source.