I'm maintaining the Reko decompiler and working on bugfixes in its support for AArch64. I've been asked to fix an issue in an AArch64 binary that contains the following instruction:
0EA0B9BF abs v31.2s,v13.2s
I've compared the output above (which comes from Reko's AArch64 disassembler) with objdump and it matches. I've consulted the ARM documentation for the abs instruction to understand what this instruction is doing.
Note that the .2s suffixes imply that the instruction is operating on two 32-bit signed integers, but v31 and v13 are 128-bit. My initial guess was that the instruction leaves the upper 64 bits of v31 untouched, but I'm not certain my interpretation is correct. Consulting scripture reveals the following pseudocode for abs:
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
In the pseudocode, the result variable is not initialized with the original value of V[d] but does write back the whole 128 bits in the final pseudo-statement.
So: is result actually zero-initialized, meaning that the upper 64 bits are cleared after execution of this instruction? And will this apply to all SIMD instructions whose outputs do not "cover" the full 128 bits of the destination SIMD register?
Unfortunately I don't have the appropriate silicon to test this myself, and available AArch64 emulators are crashing when I try using them with SIMD instructions.
As per ARM Architecture Reference Manual (Armv8, for A-profile architecture), section C1.2.5:
This confirms that writes to 64 bit ASIMD vectors zero out the upper bits of the vector. Note that behaviour is different in AArch32 state, where 64 bit and 128 bit vectors have different numbering schemes and merely overlap. There, no other half exists to be zeroed out as 64 bit operations target 64 bit vector registers.