How do I get this Assembly branching logic to work?

83 Views Asked by At

I've been trying to implement a multiplication program in ARMv8 Assembly. I've been able to get the skeleton to work properly, but for some reason it isn't multiplying properly.

    .initial:      .string "multiplier = 0x%08x (%d) multiplicand = 0x%08x (%d)\n\n"
    .product1:     .string "product = 0x%08x multiplier = 0x%08x\n\n"
    .result1:      .string "64-bit result = 0x%016llx (%lld)\n\n" // Corrected format specifier

    define(FALSE, 0)
    define(TRUE, 1)
    define(multiplier, w18)
    define(multiplicand, w19)
    define(product, w20)
    define(i, w21)
    define(negative, w22)
    define(result, x18)
    define(temp1, x19)
    define(temp2, x20)
    define(product64, x21)
    define(multiplier64, x22)
    define(multiplicand64, x23)

    .balign 4
    .global main
    .text

main:
    stp     x29, x30, [sp, -16]!
    mov     x29, sp

    mov     multiplicand, -30
    mov     multiplier, 70
    mov     product, 0
    mov     i, 0
print1:
    adrp    x0, .initial
    add     x0, x0, :lo12:.initial
    mov     w1, multiplier
    mov     w2, multiplier
    mov     w3, multiplicand
    mov     w4, multiplicand
    bl      printf

multiplier_check:
    cmp     multiplier, 0
    b.ge    Loop1
    mov     negative, TRUE
Loop1:
    add     i, i, 1
    cmp     i, 32
    b.gt    end
    tst     multiplier, 0x1
    b.eq    Loop3
    asr     multiplier, multiplier, 1

Loop3:
    add     product, product, multiplicand

Loop2:
    tst     product, 0x1
    b.ne    Loop4
    orr     multiplier, multiplier, 0x80000000
Loop4:
    and     multiplier, multiplier, 0x7FFFFFFF
Loop5:
    asr     product, product, 1

negative_check:
    cmp     negative, TRUE
    b.eq    Loop6
    b       print2
Loop6:
    sub     product, product, multiplicand
    b Loop5 // Corrected loop condition

print2:
    adrp    x0, .product1
    add     x0, x0, :lo12:.product1
    mov     w1, product
    mov     w2, multiplier
    bl      printf

    sxtw    product64, product
    sxtw    multiplier64, multiplier
    mul     result, product64, multiplier64 // Corrected multiplication for 64-bit result

    adrp    x0, .result1
    add     x0, x0, :lo12:.result1
    mov     x1, result
    mov     x2, result
    bl      printf
end:
    mov     w0, 0
    ldp     x29, x30, [sp], 16
    ret

Normally its supposed to output the product and multiplicand in 8-bit hexadecimal along with the 64-bit result, but innstead of outputting it properly I keep getting this:

multiplier = 0x00000046 (70) multiplicand = 0xffffffe2 (-30)

product = 0xfffffff1 multiplier = 0x00000046

64-bit result = 0xfffffffffffffbe6 (-1050)

In order to fix this I tried to change the branch logic to see what been going wrong, but I keep getting this same output. (Although I was able to get the proper sign now.) Can anyone help me see where the logic is breaking?

Edit: For clarification, I use these commands for it to run.

m4 assign2a.asm > assign2a.s
 gcc assign2a.s -o e.o
 ./e.o
1

There are 1 best solutions below

8
Nate Eldredge On

I haven't gone through your entire program, but here are some comments. Some are definitely bugs (but may or may not be the specific bug causing your problem), some are just sketchy coding.

General coding style and practice

  • Comment your code! Comments should explain, not only what the code is doing, but more importantly why.

  • Use more descriptive label names than Loop1, Loop2, etc. What is supposed to happen when you branch to this label? Name it accordingly.

  • Single-step with a debugger to see what your code is actually doing, and inspect the register contents as you go. If you understand your code well, then you should know what values should be in all the registers at all times. When you see them diverge from what you expect, you have found a bug. It looks like you are building and running your code natively on some aarch64 Unix-like system, so that means you can run gdb or lldb natively as well.

  • Defining symbolic names for all your registers is nice in some ways, but it's risky that you've chosen registers that overlap. (You do know that, e.g., w18 is the low 32 bits of x18, right?) So you have to remember that multiplier and result conflict with each other, and that writing to one will overwrite the other; and the same for all the other pairs. This sounds like a recipe for bugs.

Calling convention violations

  • The call-preserved registers in the standard aarch64 calling conventions are x19-x28. So w18/x18 is not call-preserved and may be overwritten when you call printf or any other C function.

  • You're guaranteed that a function call will not clobber x19-x28, but by that same token, your function (even if it is main) must not clobber them either. So push any of those registers that you use at the start of main, and pop them before returning.

Branch logic bugs

  • In

      multiplier_check:
          cmp     multiplier, 0
          b.ge    Loop1
          mov     negative, TRUE
      Loop1:
    

    The negative register (really w22) has not been initialized before this. So if multiplier is nonnegative and the branch is taken, negative retains its previous garbage value which won't necessarily be 0.

    This is a situation (as in many other places in your code) where a branch can be replaced with a conditional-select. You can simply do

        cmp multiplier, 0
        cset negative, lt
    

    which will set the negative register to 0 or 1 according to whether the condition code lt (complement of ge) is false or true.

    (Actually, this one could be optimized further as the single instruction lsr negative, multiplier, 31 to just shift the sign bit of multiplier into the low bit of negative.)

  • Loop2:
        tst     product, 0x1
        b.ne    Loop4
        orr     multiplier, multiplier, 0x80000000
    Loop4:
        and     multiplier, multiplier, 0x7FFFFFFF
    

    This doesn't look right. If the branch is not taken, then orr sets the sign bit of multiplier, but then you fall through to the and instruction which just clears that bit again.

    You could put an unconditional branch after the orr to jump over the and. However this whole sequence can also be replaced with a bitfield move: bfi multiplier, product, 31, 1 to insert the low bit of product into bit 31 of multiplier.