I'm learning data movement(MOV) in assembly.
I tried to compile some code to see the assembly in a x86_64 Ubuntu 18.04 machine:
typedef unsigned char src_t;
typedef xxx dst_t;
dst_t cast(src_t *sp, dst_t *dp) {
*dp = (dst_t)*sp;
return *dp;
}
where src_t is unsigned char. As for the dst_t, I tried char, short, int and long.
The result is shown below:
// typedef unsigned char src_t;
// typedef char dst_t;
// movzbl (%rdi), %eax
// movb %al, (%rsi)
// typedef unsigned char src_t;
// typedef short dst_t;
// movzbl (%rdi), %eax
// movw %ax, (%rsi)
// typedef unsigned char src_t;
// typedef int dst_t;
// movzbl (%rdi), %eax
// movl %eax, (%rsi)
// typedef unsigned char src_t;
// typedef long dst_t;
// movzbl (%rdi), %eax
// movq %rax, (%rsi)
I wonder why movzbl is used in every case? Shouldn't it correspond to dst_t?
Thanks!
If you're wondering why not
movzbw (%rdi), %axforshort, that's because writing to 8-bit and 16-bit partial registers has to merge with the previous high bytes.Writing a 32-bit register like EAX implicitly zero-extends into the full RAX, avoiding a false dependency on the old value of RAX or any ALU merging uop. (Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?)
The "normal" way to load a byte on x86 is with
movzblormovsbl, same as on a RISC machine like ARMldrborldrsb, or MIPSlbu/lb.The weird-CISC thing that GCC usually avoids is a merge with the old value that replaces only the low bits, like
movb (%rdi), %al. Why doesn't GCC use partial registers? Clang is more reckless and will more often write partial regs, not just read them for stores. You might well see clang load into just%aland store whendst_tissigned char.If you're wondering why not
movsbl (%rdi), %eax(sign-extension)The source value is unsigned, therefore zero-extension (not sign-extension) is the correct way to widen it according to C semantics. To get
movsbl, you'd needreturn (int)(signed char)c.In
*dp = (dst_t)*sp;the cast todst_tis already implicit from the assignment to*dp.The value-range for
unsigned charis 0..255 (on x86 where CHAR_BIT = 8).Zero-extending this to
signed intcan produce a value range from0..255, i.e. preserving every value as signed non-negative integers.Sign-extending this to
signed intwould produce a value range from-128..+127, changing the value ofunsigned charvalues >= 128. That conflicts with C semantics for widening conversions preserving values.It has to widen at least as wide as
dst_t. It turns out that widening to 64-bit by usingmovzbl(with the top 32 bits handled by implicit zero-extension writing a 32-bit reg) is the most efficient way to widen at all.Storing to
*dpis a nice demo that the asm is for adst_twith a width other than 32-bit.Anyway, note that there's only one conversion happening. Your
src_tgets converted todst_tin al/ax/eax/rax with a load instruction, and stored to dst_t of whatever width. And also left there as the return value.A zero-extending load is normal even if you're just going to read the low byte of that result.