dosseg
.model small
.stack 100h
.data
array db -1, -2, -3, -4, 1,2, 3, -5
.code
main PROC
mov ax, @data
mov ds, ax
xor ax, ax
xor dx, dx ; reset dx
lea si, array
mov cx, 8
back:
mov bl, [si]
cmp al, bl
jc continue ; carry will be generated if number in bl is positive
inc dx
continue:
inc si
clc
loop back
mov ah, 4ch
int 21h
main ENDP
end main
I wrote the above program to find the number of negative integers in an array.
Debugging showed that when SI is pointing at -1, the carry flag becomes 1 but it should not as the value at that instant in BL is FFh (negative) and in AL is 00h, so subtracting negative number from 0 should not generate a carry. What am I doing wrong?
Edit: I replaced the erroneous part with :
test bl, bl
jns continue
and now it works as expected but I still don't know why the cmp method did not work.
If you just want to branch, use signed or signed-compare conditions
test reg,reg/jns non_negative(not sign-bit-set) orjnl non_negative(not less-than) are equivalent after a compare with zero.That uses the FLAGS and conditions for their normal semantic meaning, i.e. doing a normal signed compare.
(
test same,sameis equivalent tocmpagainst zero, always clearing OF and CF, and is a well-known optimization forcmp reg, 0)What you're doing doesn't set CF in a way that reflects the sign-bit, so a
jc(jump if CF set) isn't useful. You're counting non-zero numbers, ones where0U < (unsigned)xis true.Getting the carry flag set according to the MSB
It's only interesting to get your condition into CF if you're going to take advantage of that
by using
adc dx, 0orsbb dx, -1to conditionally increment DX (when CF is 1 or 0, respectively.)The
sbbversion is likedx -= -1 + CFso CF either cancels out the -1, or you subtract -1, i.e. add 1.One way to get CF set according to the sign bit of a byte is simply to shift it out, e.g.
shl bl, 1, if you don't mind destroying the value in BL. Equivalently,add bl,blis also a 2-byte instruction but can run on more execution units on modern CPUs. (They both set FLAGS the same way, including CF).It's not possible with a compare against zero.
0 - xalways has a borrow (CF=1) for any non-zerox, andx - 0never has carry-out.Without modifying the register value, it is possible with
cmp, though:0x7f - xhas unsigned wrapping (i.e. borrow output that sets CF) for x>=0x80 unsigned. i.e. for values with their MSB set.You don't need
clcin this or your version. CF isn't "sticky"; anything that updates its value sets it to 0 or 1 regardless of the old value. And it's not an input forcmp.We can't set CF=1 for
bl < 0(akabl >= 0x80U) withcmp bl, constant, unfortunately. It only works the way you're doing it, setting another register to compare against. (cmp reg, 123exists,cmp 123,regdoesn't; most 2-operand instructions modify their destination and wouldn't make sense with an immediate destination, so it would be a special case to have yet another opcode forcmpin the other direction.)But you can do
cmp bl, 0x80to clear CF whenbl < 0x80, i.e. when its sign bit isn't set.Loading the value into a register with
mov bl, [si]can be helpful for debugging, making it show up in your debugger's window of registers instead of having to examine memory. But that's not necessary;cmpworks with reg or memory operands (or an immediate), saving an instruction.As a further optimization for code-size inside the loop,
scasbis equivalent tocmp al, es:[di]/inc di(but theincpart doesn't set FLAGS.) And it's actuallydec diif DF is set, so you'd wantcldsomewhere in your program before a loop using "string" instructions to make sure they go in the forward direction.Using
scasbmeans you need to use AL for that. Withoutscasb, you could count into AL inside the loop, where it could be the exit status for your DOS call. (Perhaps that's why you were trying to use AL=0, if you wanted to exit(0) instead of returning a value.)scasbisn't particularly fast on modern CPUs, but it is on real 8086; so is theloopinstruction, because they're both compact code-size.loopis a special-case optimization fordec cx/jnz(but also without affecting FLAGS).Or with 386 instructions,
bt word ptr [si], 7to Bit Test that bit, putting the result in CF where you canadd dx, 0.btis slow on modern CPUs withbt mem, reg(like 10 uops) because it can index outside the word indexed by the addressing mode. So it would be less efficient putbt word ptr [array], cxin a loop withcxinitially =7and incrementing withadd cx, 8inside the loop. But that would work.btis not too bad withbt mem, imm, only 2 uops on most modern Intel and 1 on some AMD (https://uops.info/). It's only a single uop forbt reg, immorbt reg,reg, likecmp, if you want to load first. (It can't macro-fuse with branches into a single uops, so if branching instead of adc, acmp/jlewould be more efficient as well as more readable.) On AMD,bts/btr/btcto also modify the bit are slower thanbteven forreg,reg, decoding to extra uops.SSE2 + popcnt to check 4, 8, or 16 bytes at once
The extra fun way, since you have exactly 8 bytes, uses SSE2 and
popcnt. (Yes this can work in 16-bit real mode, unlike AVX. In a bootloader and maybe DOS you'd have to manually enable the control-register bits that make SSE instructions not fault. Of course it only works on CPUs withpopcnt, like Nehalem and later from 2008 or so, otherwise usepcmpgtb/psadbw/movqfor just SSE2, or SSE1 using MMX registers.)Would also work easily for 4 or 16 byte arrays, or for other compile-time-constant sizes, do 2 loads and shift out overlapping bytes.
For other element sizes, there's
movmskps(dword) andmovmskpd(qword)With a larger array, you'd want to start accumulating counts in vector regs, like
pcmpgtbto compare for0 > x/psubb xmm1, xmm0to dototal -= (0 or -1), up to 255 iterations of 16 bytes. Then accumulate withpsadbwagainst zero. Same problem as How to count character occurrences using SIMD but replacingpcmpeqbwithpcmpgtb.