We got this problem at school for students that want to test themselves. I have spent quite some time on this but can't figure it out.
You have 16-bit number in AX register, this number is signed. Get its absolute value, the number in AX has to be unchanged (EDIT: There isn't liminted number of registers, and the AX register can be changed but at the end of the function it needs to be the original number), and the answer should be in BX. You can only use these instructions:
MOV, ADD, XOR, SUB, NOT, AND, OR, NEG.
It is quite easy to do with SAR the way compilers do, but without it it's not clear how to get any behaviour conditional on the sign bit.
Silly idea #1: Lookup table. That can't work in 16-bit real mode. Even a whole 64kiB segment for a table isn't enough; we need twice that to be able to look up a 2-byte result for any possible 16-bit value.
We could do this with 32-bit addressing easily, like
xor ebx, ebx/mov bx, ax/mov bx, [table + ebx*2], if you can justify 128kiB of table data. :PFully within the rules, you could construct the table on the stack in 32-bit or 64-bit mode with
sub esp, 1<<17and store the data withmov word [esp+0], 0/mov word [esp + 2], 1/ etc. Fully unrolled, no looping, so about 256kiB of machine code. But again, this doesn't work in real mode and is a total joke for efficiency.We can use x86 partial register shenanigans to isolate the sign bit as a 0 / 1 integer:
Or the last 3 instructions can be optimized to 2
Jackpot; we have our
sar ax,15orcwdvalue that has all bits 0 or all bits 1, broadcasting the sign bit of AX, ready to use with a 2's complement identity (How to prove that the C statement -x, ~x+1, and ~(x-1) yield the same results?) like compilers use (https://godbolt.org/z/n3yoUp).Typically you'd use
xor ax, dx/sub ax, dxto modify the original value.I had previously thought the challenge required you to avoid modifying any other registers, otherwise the restriction on leaving AX unmodified is trivial and not worth making part of the challenge. But I don't think it's possible without extra scratch space in memory or another register. The edit clarified that there's no need for that.
XOR with
-1flips all the bits like NOT. XOR with0is a no-op.SUB with
-1increments by 1, SUB with0is a no-op. (0is the identity element for addition and xor.)So this conditionally applies the 2's complement
-x = ~x + 1identity.PS: It took me several minutes of head-scratching to think of this, ruling out any full-register approaches, and I'm very familiar with x86 and pretty familiar with bit-manipulation, e.g. writing codegolf.SE answers in x86 machine code and doing non-trivial stuff with SIMD. IMO this is a fun hard challenge.
Also, you'd never want to write code like this in real life;
cwdorcdqis much more efficient, or for source regs other than AX, copy andsar. The partial-register stuff will even cause stalls on some out-of-order execution CPUs like Intel PPro through Nehalem.For example, on the Godbolt compiler explorer for this source:
Using an unsigned return value allows us to avoid signed-integer overflow undefined behaviour for the most-negative 2's complement integer. (
-INT_MINis undefined behaviour). I think the way I wrote it actually relies on the C implementation being 2's complement, though, because0U - xconvertsxto unsigned to match the other side before using it as an operand to binary-. Or maybe that's what we want, for unsigned0U-xto produce0x8000from an input of0x8000(for 16-bit int).GCC does this to set EAX = abs(EDI) (x86-64 System V calling convention).
clang does this for x86-64, using a conditional move that reads flags from NEG:
it would have been more efficient on some CPUs to do:
Sandybridge eliminates xor-zeroing but not mov, and for CPUs that don't do
movelimination this shortens the critical path.mov eax,ediis on the critical path butxor-zeroing isn't. Or we could have donemov eax, edi/neg edi/cmovnl eax, edito again allow MOV and NEG to run in parallel.CMOV is a 2-uop instruction on Intel CPUs before Broadwell. (CMOVA and CMOVBE are still 2 uops on current Intel because they read CF and ZF, which are renamed separately in different groups. Others are 1 uop)