In MASM64, is there an instruction for pushing a 16-bit immediate on the stack?

536 Views Asked by At

In MASM64, if I write the instruction push 0, it will push a 64-bit immediate on the stack (i.e. RSP = RSP - 8).

So if I just want to push a 16-bit immediate to set FLAGS, I have no idea but write the machine code, such as:

.code
FlagFunction PROC
    dd 00006866h; push a 16-bit immediate 0
    popf
    ret
FlagFunction ENDP
END

The program works but I wonder if there is an actual instruction for this in MASM64.

2

There are 2 best solutions below

6
Danny Cohen On

According to the Intel manual here: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

PUSH imm16 in 64-bit mode is allowed. When you push a 16-bit immediate value onto the stack using the push imm16 instruction in x86-64 assembly, it will not be zero-extended. The push imm16 instruction adjusts RSP by 2 instead of 8 and only pushes the 16-bit value onto the stack. This is the algorithm the CPU uses:

IF StackAddrSize = 64
THEN
    IF OperandSize = 64
    THEN
        RSP := RSP – 8;
        Memory[SS:RSP] := SRC; (* push quadword *)
    ELSE IF OperandSize = 32
    THEN
        RSP := RSP – 4;
        Memory[SS:RSP] := SRC; (* push dword *)
    ELSE (* OperandSize = 16 *)
        RSP := RSP – 2;
        Memory[SS:RSP] := SRC; (* push word *)
FI
2
Peter Cordes On

64-bit and 16-bit (but not 32-bit) pushes are both possible in 64-bit mode. But normally you only want 64-bit stack operations.

MASM supports two syntaxes for 16-bit pushes. (I tested with jwasm -Zne to disable extensions that MASM wouldn't support, since I don't have MASM itself):

 pushw  123                 ; can assemble to 66 6a 7b   push sign_extended_imm8
 push  word ptr 123         ; JWASM uses      66 68 7b 00  push imm16

It seems insane to me to use ptr for an immediate; I'd have expected that to be the syntax for pushing a memory source operand with an absolute addressing mode, but that would be push word ptr [123]. MASM syntax often doesn't make sense.

(Forcing the longer encoding with word ptr vs. pushw might unique to JWASM, treating it as the equivalent of NASM push strict word 123. Agner Fog's objconv -fmasm disassembles 66 6A 7B to push word ptr 123. Prefer pushw because of JWASM.)


In NASM it's push word 123, in GAS .intel_syntax noprefix it's pushw 123. GAS Intel syntax is MASM-like and also assembles push word ptr 123 the same way. AT&T syntax is of course pushw $0x1234; operand-size suffixes are standard for AT&T syntax, vs. a special case for instructions with an implicit memory operand.


To set FLAGS / RFLAGS

If you only need to modify the low 8 bits of FLAGS (condition codes other than OF), use mov eax, 0x00003400 / sahf - Store AH into Flags. Or for example lahf / or ah, 1 / sahf to inefficiently emulate stc (Set Carry Flag).

To set RFLAGS, you want push 0x1234 (qword push of a sign extended imm32) / popfq. FLAGS is the low 16 bits of RFLAGS. (https://en.wikipedia.org/wiki/FLAGS_register).

Stack operations will always affect RSP, not ESP.

MASM / JWASM assemble popf as a 16-bit pop rather than the default size for the mode, so you need popfq. Unfortunately you can't even use popfw to make it explicit, you'd need a comment. (Or use a better assembler like NASM where pushf/popf use the same default operand-size as push 123.)

If you wanted to avoid writing the reserved and special bits in the upper 16 bits of FLAGS with zeros (i.e. just modify FLAGS without touching the rest of RFLAGS/EFLAGS), you could use this inefficient method (with a store-forwarding stall from the wide load containing a recent narrow store.) popf and popfq are slowish anyway because microcode has to see if you're setting/clearing special flags like IF. (https://agner.org/optimize/, e.g. 13 cycles on Zen 3/4, 20 cycles on Skylake.)

  pushfq                                ; qword push
  mov  word ptr [rsp], 0x1234           ; modify the low word
  popfq                                 ; qword pop

Or with a 16-bit push, if temporary stack misalignment is safe (see below)

  pushw  0x1234
  popf                 ; popfw

The push imm16 encoding has an LCP stall when decoding on Intel CPUs. You might consider mov eax, 0x1234 / push ax if you can't avoid using popf or popfq in code where performance matters. Or not, since LCP stalls only happen during legacy decode, not from the uop cache.


Windows makes it unsafe to even temporarily misalign RSP by 8?

Joshua comments that Windows can randomly crash your process if RSP is ever misaligned (not a multiple of 8). I don't know the mechanism for this, but perhaps delivery of SEH exceptions if that's possible at that point in your code?

Joshua suggests that a 2-byte push could crash if it needs to grow the stack, because you'd enter the stack growth handler with RSP not aligned. And there can be other possible mechanisms which might not be fixable by making sure this isn't the deepest the stack's ever grown.

We know normal Windows code can use 8-byte push / pop in function prologues / epilogues since compilers do that, so it's not the same alignment by 16 that the function-calling convention requires.

I seem to recall something about stack-unwind metadata requiring that Windows x64 functions only modify RSP at all during their prologue and epilogue, but I think C compilers for Windows do support alloca so that can't be fully true. Of course, alloca will round up the stack adjustment to keep RSP aligned. Probably that requirement to not move RSP at all in the middle of a function only applies if you aren't using RBP as a frame pointer. If someone has something authoritative I could link re: what's safe to do with RSP in Windows programs, let me know.

I'd be surprised if Linux had any problem delivering a signal to a thread where RSP wasn't aligned by 8. (Or with anything else). The ABI has guarantees involving RSP % 16 == 8 on function entry (so does Windows), so signal stack handling has to re-align because a signal could be delivered at any point, between any two instructions, and code can definitely use push-qword (so can Windows). I assume the kernel uses something like user_regs.rsp -= 128; // preserve the red-zone user_regs.rsp &= -16; // align