I am venturing into decoding instructions, for now on 80x86 16bit machines, I don't have many problems in decoding instructions that do not have an immediate value as their source operand, the problem arises when the source operand is no longer a register or a location memory but an immediate value. The following instruction I would decode it this way:
mov ax, 3
101110|11| |11|000|000| 00000011 -> 3 with sign expansion
| | | register AX
s = 1 | null
w = 1 |
the second operand is a register
Instead it is not fair. this is the right decoding:
mov ax, 3
10111000 00000011 0000000
Can someone explain to me how decoding works assuming the source operand is an immediate value?
Decoding for the x86 works by consulting tables.
If you were given a byte for which you already know that it is an instruction opcode (and not an instruction prefix) and that byte would hold the value B8h (10111000b) you would see in the table(s) that it stands for
mov ax, imm16
.In your first snippet, you try to dissect the BBh (10111011b) opcode, but if you would consult the same table(s), you would see that it stands for
mov bx, imm16
.There is however a second way to encode the
mov ax, imm16
instruction using a modr/m byte like you tried to do in your first snippet:This opcode does not have an s-bit; there's no sign extension available. Therefore this encoding is seldom used by assemblers that care about code size.
A similar pair of encodings exist for the
ADD
,ADC
,SUB
,SBB
,CMP
,AND
,OR
,XOR
, andTEST
instructions. But for these the short form, the one without the modr/m byte, only applies to theAX
register.You can find all the tables you need in the Intel manuals at https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html