I'm writing some ARM64 assembly code for macOS, and it needs to access a global variable.
I tried to use the solution in this SO answer, and it works fine if I just call the function as is. However, my application needs to patch some instructions of this function, and the way I'm doing it, the function gets moved somewhere else in memory in the process. Note the adrp/ldr pair is untouched during patching.
However, if I try to run the function after moving it elsewhere in memory, it no longer returns correct results. This happens even if I just memcpy() the code as is, without patching. After tracing with a debugger, I isolated the issue to the address of the global valuable being incorrectly loaded by the adrp/ldr pair (and weirdly, the ldr is assembled as an add, as seen with objdump straight after compiling the binary -- not sure if it's somehow related to the issue here.)
What would be the correct way to load a global variable, so that it survives the function being copied somewhere else and run from there?
There's the issue. If you rip code out of the binary it's in, then you effectively need to re-link it.
There's two ways of dealing with this:
If you have complete control over the segment layout, then you could have one executable segment with all of your assembly in it, and right next to it one segment with all addresses that code needs, and make sure the assembly ONLY has references to things on that page. Then wherever you copy your assembly, you'd also copy the data page next to it. This would enable you to make use of static addresses that get rebased by the dynamic linker at the time your binary is loaded. This might look something like:
Compile that with
-Wl,-segprot,__ASM,rx,rxand you'll get an executable__ASMand a writeable__REFsegment. Those two would have to maintain their relative position to each other when they get copied around.(Note that on arm64 macOS you cannot put symbol references into executable segments for the dynamic linker to rebase, because it will fault and crash while trying to do so, and even if it were able to do that, it would invalidate the code signature.)
You act as a linker, scanning for PC-relative instructions and re-linking them as you go. The list of PC-relative instructions in arm64 is quite short, so it should be a feasible amount of work:
adrandadrpbandblb.cond(andbc.condwith FEAT_HBC)cbzandcbnztbzandtbnzldrandldrsw(literal)ldr(SIMD & FP literal)prfm(literal)(You can look for the string
PC[]in the ARMv8 Reference Manual to find all uses.)For each of those you'd have to check whether their target address lies within the range that's being copied or not. If it does, then you'd leave the instruction alone (unless you copy the code to a different offset within the 4K page than it was before, in which case you have to fix up
adrpinstructions). If it isn't then you'll have to recalculate the offset and emit a new instruction. Some of the instructions have a really low maximum offset (tbz/tbnz ±32KiB). But usually the only instructions that reference addresses across function boundaries areadr,adrp,b,blandldr. If all code on the page is written by you then you can doadrp+addinstead ofadrandadrp+ldrinstead of justldr, and if you have compiler-generated code on there, then alladr's andldr's will have anopbefore or after, which you can use to turn them into anadrpcombo. That should get your maximum reference range up to ±128MiB.