It is nearly impossible(*) to provide strict IEEE 754 semantics at reasonable cost when the only floating-point instructions one is allowed to used are the 387 ones. It is particularly hard when one wishes to keep the FPU working on the full 64-bit significand so that the long double type is available for extended precision. The usual “solution” is to do intermediate computations at the only available precision, and to convert to the lower precision at more or less well-defined occasions.
Recent versions of GCC handle excess precision in intermediate computations according to the interpretation laid out by Joseph S. Myers in a 2008 post to the GCC mailing list. This description makes a program compiled with gcc -std=c99 -mno-sse2 -mfpmath=387 completely predictable, to the last bit, as far as I understand. And if by chance it doesn't, it is a bug and it will be fixed: Joseph S. Myers' stated intention in his post is to make it predictable.
Is it documented how Clang handles excess precision (say when the option -mno-sse2 is used), and where?
(*) EDIT: this is an exaggeration. It is slightly annoying but not that difficult to emulate binary64 when one is allowed to configure the x87 FPU to use a 53-bit significand.
Following a comment by R.. below, here is the log of a short interaction of mine with the most recent version of Clang I have :
Hexa:~ $ clang -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.4.0
Thread model: posix
Hexa:~ $ cat fem.c
#include <stdio.h>
#include <math.h>
#include <float.h>
#include <fenv.h>
double x;
double y = 2.0;
double z = 1.0;
int main(){
x = y + z;
printf("%d\n", (int) FLT_EVAL_METHOD);
}
Hexa:~ $ clang -std=c99 -mno-sse2 fem.c
Hexa:~ $ ./a.out
0
Hexa:~ $ clang -std=c99 -mno-sse2 -S fem.c
Hexa:~ $ cat fem.s
…
movl $0, %esi
fldl _y(%rip)
fldl _z(%rip)
faddp %st(1)
movq _x@GOTPCREL(%rip), %rax
fstpl (%rax)
…
For the record, below is what I found by experimentation. The following program shows various behaviors when compiled with Clang:
The results vary with the optimization level:
The cast
(double)that differentiatesr5andr6at-O2has no effect at-O0and for variablesr3andr4. The resultr1is different fromr5at all optimization levels, whereasr6only differs fromr3at-O2.