Absolute value without branching or shifts, only add/sub and booleans

Question

Absolute value without branching or shifts, only add/sub and booleans

1.6k Views Asked by Adam At 12 April 2020 at 21:00

We got this problem at school for students that want to test themselves. I have spent quite some time on this but can't figure it out.

You have 16-bit number in AX register, this number is signed. Get its absolute value, the number in AX has to be unchanged (EDIT: There isn't liminted number of registers, and the AX register can be changed but at the end of the function it needs to be the original number), and the answer should be in BX. You can only use these instructions:
MOV, ADD, XOR, SUB, NOT, AND, OR, NEG.

It is quite easy to do with SAR the way compilers do, but without it it's not clear how to get any behaviour conditional on the sign bit.

Original Q&A

There are 2 best solutions below

**Peter Cordes** · Answer 1 · 2020-04-12T22:30:38.120000

Silly idea #1: Lookup table. That can't work in 16-bit real mode. Even a whole 64kiB segment for a table isn't enough; we need twice that to be able to look up a 2-byte result for any possible 16-bit value.

We could do this with 32-bit addressing easily, like xor ebx, ebx / mov bx, ax / mov bx, [table + ebx*2], if you can justify 128kiB of table data. :P

Fully within the rules, you could construct the table on the stack in 32-bit or 64-bit mode with sub esp, 1<<17 and store the data with mov word [esp+0], 0 / mov word [esp + 2], 1 / etc. Fully unrolled, no looping, so about 256kiB of machine code. But again, this doesn't work in real mode and is a total joke for efficiency.

We can use x86 partial register shenanigans to isolate the sign bit as a 0 / 1 integer:

    xor  dx, dx           ; DX = 0
    mov  dl, ah           ; DX = AX>>8   (zero extended)
    add  dx, dx           ; DX <<= 1  shifts the sign bit alone into DH

    mov  dl, dh
    mov  dh, 0            ; DX = (AX<0) = sign bit of AX zero extended to 16-bit

    neg  dx               ; DX = 0 or -1

Or the last 3 instructions can be optimized to 2

    neg  dh               ; 0 or -1 according to sign bit of AX
    mov  dl, dh           ; duplicate to the full DX = 0 or -1

Jackpot; we have our sar ax,15 or cwd value that has all bits 0 or all bits 1, broadcasting the sign bit of AX, ready to use with a 2's complement identity (How to prove that the C statement -x, ~x+1, and ~(x-1) yield the same results?) like compilers use (https://godbolt.org/z/n3yoUp).

Typically you'd use xor ax, dx / sub ax, dx to modify the original value.

I had previously thought the challenge required you to avoid modifying any other registers, otherwise the restriction on leaving AX unmodified is trivial and not worth making part of the challenge. But I don't think it's possible without extra scratch space in memory or another register. The edit clarified that there's no need for that.

    mov  bx, ax
    xor  bx, dx           ; ~x      or x
    sub  bx, dx           ; ~x + 1  or x

XOR with -1 flips all the bits like NOT. XOR with 0 is a no-op.

SUB with -1 increments by 1, SUB with 0 is a no-op. (0 is the identity element for addition and xor.)

So this conditionally applies the 2's complement -x = ~x + 1 identity.

PS: It took me several minutes of head-scratching to think of this, ruling out any full-register approaches, and I'm very familiar with x86 and pretty familiar with bit-manipulation, e.g. writing codegolf.SE answers in x86 machine code and doing non-trivial stuff with SIMD. IMO this is a fun hard challenge.

Also, you'd never want to write code like this in real life; cwd or cdq is much more efficient, or for source regs other than AX, copy and sar. The partial-register stuff will even cause stalls on some out-of-order execution CPUs like Intel PPro through Nehalem.

For example, on the Godbolt compiler explorer for this source:

unsigned absval(int x) {
    return x<0 ? 0U - x : x;
}

Using an unsigned return value allows us to avoid signed-integer overflow undefined behaviour for the most-negative 2's complement integer. (-INT_MIN is undefined behaviour). I think the way I wrote it actually relies on the C implementation being 2's complement, though, because 0U - x converts x to unsigned to match the other side before using it as an operand to binary -. Or maybe that's what we want, for unsigned 0U-x to produce 0x8000 from an input of 0x8000 (for 16-bit int).

GCC does this to set EAX = abs(EDI) (x86-64 System V calling convention).

    mov     eax, edi
    cdq                      ; sign-extend EAX into EDX:EAX
    xor     eax, edx
    sub     eax, edx
    ret

clang does this for x86-64, using a conditional move that reads flags from NEG:

    mov     eax, edi
    neg     eax                 ; 0 - x
    cmovl   eax, edi            ; copy the original if 0 was < x
    ret

it would have been more efficient on some CPUs to do:

    ; shorter critical path on CPUs where mov is not zero latency
    xor     eax, eax
    sub     eax, edi            ; 0 - x
    cmovl   eax, edi            ; copy the original if 0 was < x
    ret

Sandybridge eliminates xor-zeroing but not mov, and for CPUs that don't do mov elimination this shortens the critical path. mov eax,edi is on the critical path but xor-zeroing isn't. Or we could have done mov eax, edi / neg edi / cmovnl eax, edi to again allow MOV and NEG to run in parallel.

CMOV is a 2-uop instruction on Intel CPUs before Broadwell. (CMOVA and CMOVBE are still 2 uops on current Intel because they read CF and ZF, which are renamed separately in different groups. Others are 1 uop)

**Adam** · Answer 2 · 2020-04-12T23:19:45.640000

So thanks to Peter Cordes's answer, the code is quite simple, the problem was the SAR instruction, but Peter created it quite nicely.

The number is in already loaded to AX

; this is practicaly the SAR instruction, 
; the mask for the absolute value is 
; number >> (sizeof(short)) * CHAR_BIT -1)
; number >>        (2 * 8) - 1 = 15
; so normaly we would do SAR bx, 15 and done

mov bl, ah  ; BX = AX>>8
add bx, bx  ; BX <<= 1
neg bh      ; 0 or -1 
mov bl, bh  ; duplicate the full BX

mov cx, ax  ;
add cx, bx  ; the number + mask 
xor bx, cx  ; (number+mask) ^ mask

now the answer is in BX, and AX was not altered

Absolute value without branching or shifts, only add/sub and booleans

There are 2 best solutions below

Related Questions in ASSEMBLY

Related Questions in X86

Related Questions in BIT-MANIPULATION

Related Questions in ABSOLUTE-VALUE

Related Questions in BRANCHLESS

Trending Questions

Popular # Hahtags

Popular Questions