Why an extended precision float-point number is not printed correctly by using Windows x64 assembly

93 Views Asked by At

The number that I want to print is real_number_1 dt 1.234567e20. The following code is the code I wrote to print, but it finally prints 0.000000e+00.

;code

bits 64
default rel

segment .bss
    temp resq 1
segment .data
    real_number_1 dt 1.234567e20 ; extended-precision float
    format_1 db "real_number_1: %le", 0xa, 0
segment .text
global main
extern ExitProcess
extern printf



main:

    push    rbp
    mov     rbp, rsp
    sub     rsp, 32

    fld     TWORD [real_number_1]       ; load 80 bit floating point value to S0 (also push to FPU stack)
    fistp   QWORD [temp]                ; convert ST0 to integer and place in reserved mem location and also pop the FPU stack
    lea     rcx, [format_1]             ; first argument: format string
    cvtsi2sd xmm0, DWORD [temp]         ; convert from integer to double
    movq     rdx, xmm0                  ; move double to 2nd argument loaction
    call    printf

    xor     rax, rax
    call    ExitProcess
1

There are 1 best solutions below

2
Peter Cordes On

1.234567e20 is greater than INT64_MAX (2^63-1), so fistp produces 0x8000000000000000, which Intel calls the "integer indefinite" value1. Check with a debugger.

Actually you already did check, according to comments, by passing this 64-bit integer to printf("%lf" which interpreted it as a double (IEEE binary64) bit-pattern representing negative zero (-0.00): just the sign-bit set, rest zero. That's not in general a useful way to print a 64-bit integer (normally use %lld or %llx), but in this case essentially confirms the bit pattern.


Then you load the low 4 bytes of that (0) and convert it to double (0.0), and pass it to printf. Looks like you're calling printf correctly.

I have no idea why you only read the low dword of the qword integer you stored. If you wanted to convert to dword, you could have used fistp into a dword operand (and gotten the 32-bit "integer indefinite" value, 0x80000000. Which as a 2's complement integer represents -2147483648, so you'd have ended up with a double representing that value.)

Or if you used cvtsi2sd xmm0, QWORD [temp], you'd get -9223372036854775808.0. (The magnitude is a power of 2, specifically 2^63, so double can represent it exactly.)


Despite your title, your code does not even try to print an extended-precision float. Which is good because you can't with MSVCRT's printf on Windows; MSVC doesn't have a format for the 80-bit x87 type since they use long double = double, so even printf("%Le" won't help, unless you use a different printf, one designed for use with gcc -mlong-double-80 where long double is the 80-bit type. (printf and long double)

One thing you could do with this huge number is fstp qword to convert it to double, and pass that (in XMM1 and RDX) to printf("%e\n", .... It's well within the range of DBL_MAX, so its mantissa would just have to get rounded (to the nearest representable double); it wouldn't overflow unless the tbyte value was much larger, with an exponent too larger for a double.

Your current code is really weird. Converting the value to 64-bit integer and back doesn't work for values outside the range of int64_t. And for small values, would round to the nearest integer. Truncating that int64_t to int32_t (by using dword operand-size when reloading the store in cvtsi2sd xmm0, DWORD [temp]) makes even less sense. I wouldn't describe that as "printing an extended-precision float"!

Unless you're thinking of this as converting to uint32_t, in which case yes, converting to int64_t and truncating produces correct results for all values that fit in uint32_t, unlike if we'd used dword operand-size which would convert to int32_t. In C, it's UB to convert an out-of-range float to integer, so C compilers don't doesn't have to care what result that produces for out-of-range values.

(x86 doesn't have float to unsigned integer conversion until AVX-512. Fun fact: the integer indefinite value for those is 0xffffffff..., the max unsigned value for the type.)

Anyway, you're just printing a double result of two conversions; things go wrong way before that, as a debugger would show you.


Footnote 1: https://www.felixcloutier.com/x86/fist:fistp doesn't give the numeric value of the "integer indefinite" value, presumably that's defined in vol.1 or vol.3 of the SDM. But the manual entry for cvtsd2si SSE2 double->integer conversion does give the numeric values for 32 and 64-bit operand-size.