dosseg
.model small
.stack 100h
.data
array db -1, -2, -3, -4, 1,2, 3, -5
.code
main PROC
mov ax, @data
mov ds, ax
xor ax, ax
xor dx, dx ; reset dx
lea si, array
mov cx, 8
back:
mov bl, [si]
cmp al, bl
jc continue ; carry will be generated if number in bl is positive
inc dx
continue:
inc si
clc
loop back
mov ah, 4ch
int 21h
main ENDP
end main
I wrote the above program to find the number of negative integers in an array.
Debugging showed that when SI is pointing at -1, the carry flag becomes 1 but it should not as the value at that instant in BL is FFh (negative) and in AL is 00h, so subtracting negative number from 0 should not generate a carry. What am I doing wrong?
Edit: I replaced the erroneous part with :
test bl, bl
jns continue
and now it works as expected but I still don't know why the cmp
method did not work.
If you just want to branch, use signed or signed-compare conditions
test reg,reg
/jns non_negative
(not sign-bit-set) orjnl non_negative
(not less-than) are equivalent after a compare with zero.That uses the FLAGS and conditions for their normal semantic meaning, i.e. doing a normal signed compare.
(
test same,same
is equivalent tocmp
against zero, always clearing OF and CF, and is a well-known optimization forcmp reg, 0
)What you're doing doesn't set CF in a way that reflects the sign-bit, so a
jc
(jump if CF set) isn't useful. You're counting non-zero numbers, ones where0U < (unsigned)x
is true.Getting the carry flag set according to the MSB
It's only interesting to get your condition into CF if you're going to take advantage of that
by using
adc dx, 0
orsbb dx, -1
to conditionally increment DX (when CF is 1 or 0, respectively.)The
sbb
version is likedx -= -1 + CF
so CF either cancels out the -1, or you subtract -1, i.e. add 1.One way to get CF set according to the sign bit of a byte is simply to shift it out, e.g.
shl bl, 1
, if you don't mind destroying the value in BL. Equivalently,add bl,bl
is also a 2-byte instruction but can run on more execution units on modern CPUs. (They both set FLAGS the same way, including CF).It's not possible with a compare against zero.
0 - x
always has a borrow (CF=1) for any non-zerox
, andx - 0
never has carry-out.Without modifying the register value, it is possible with
cmp
, though:0x7f - x
has unsigned wrapping (i.e. borrow output that sets CF) for x>=0x80 unsigned. i.e. for values with their MSB set.You don't need
clc
in this or your version. CF isn't "sticky"; anything that updates its value sets it to 0 or 1 regardless of the old value. And it's not an input forcmp
.We can't set CF=1 for
bl < 0
(akabl >= 0x80U
) withcmp bl, constant
, unfortunately. It only works the way you're doing it, setting another register to compare against. (cmp reg, 123
exists,cmp 123,reg
doesn't; most 2-operand instructions modify their destination and wouldn't make sense with an immediate destination, so it would be a special case to have yet another opcode forcmp
in the other direction.)But you can do
cmp bl, 0x80
to clear CF whenbl < 0x80
, i.e. when its sign bit isn't set.Loading the value into a register with
mov bl, [si]
can be helpful for debugging, making it show up in your debugger's window of registers instead of having to examine memory. But that's not necessary;cmp
works with reg or memory operands (or an immediate), saving an instruction.As a further optimization for code-size inside the loop,
scasb
is equivalent tocmp al, es:[di]
/inc di
(but theinc
part doesn't set FLAGS.) And it's actuallydec di
if DF is set, so you'd wantcld
somewhere in your program before a loop using "string" instructions to make sure they go in the forward direction.Using
scasb
means you need to use AL for that. Withoutscasb
, you could count into AL inside the loop, where it could be the exit status for your DOS call. (Perhaps that's why you were trying to use AL=0, if you wanted to exit(0) instead of returning a value.)scasb
isn't particularly fast on modern CPUs, but it is on real 8086; so is theloop
instruction, because they're both compact code-size.loop
is a special-case optimization fordec cx
/jnz
(but also without affecting FLAGS).Or with 386 instructions,
bt word ptr [si], 7
to Bit Test that bit, putting the result in CF where you canadd dx, 0
.bt
is slow on modern CPUs withbt mem, reg
(like 10 uops) because it can index outside the word indexed by the addressing mode. So it would be less efficient putbt word ptr [array], cx
in a loop withcx
initially =7
and incrementing withadd cx, 8
inside the loop. But that would work.bt
is not too bad withbt mem, imm
, only 2 uops on most modern Intel and 1 on some AMD (https://uops.info/). It's only a single uop forbt reg, imm
orbt reg,reg
, likecmp
, if you want to load first. (It can't macro-fuse with branches into a single uops, so if branching instead of adc, acmp
/jle
would be more efficient as well as more readable.) On AMD,bts
/btr
/btc
to also modify the bit are slower thanbt
even forreg,reg
, decoding to extra uops.SSE2 + popcnt to check 4, 8, or 16 bytes at once
The extra fun way, since you have exactly 8 bytes, uses SSE2 and
popcnt
. (Yes this can work in 16-bit real mode, unlike AVX. In a bootloader and maybe DOS you'd have to manually enable the control-register bits that make SSE instructions not fault. Of course it only works on CPUs withpopcnt
, like Nehalem and later from 2008 or so, otherwise usepcmpgtb
/psadbw
/movq
for just SSE2, or SSE1 using MMX registers.)Would also work easily for 4 or 16 byte arrays, or for other compile-time-constant sizes, do 2 loads and shift out overlapping bytes.
For other element sizes, there's
movmskps
(dword) andmovmskpd
(qword)With a larger array, you'd want to start accumulating counts in vector regs, like
pcmpgtb
to compare for0 > x
/psubb xmm1, xmm0
to dototal -= (0 or -1)
, up to 255 iterations of 16 bytes. Then accumulate withpsadbw
against zero. Same problem as How to count character occurrences using SIMD but replacingpcmpeqb
withpcmpgtb
.