I apologize in advance in case the answer to my question is obvious but trust me, I have been googling the whole day and searched here aswell without finding anything relevant to it.
I am using GCC 12 (minGW x64) on my x64 windows i7 setup. I don't seem to manage to have GCC generating any float multiply-add instructions.
The simplest case:
float func(float a, float b, float c)
{
return a*b+c;
}
produces this assembly:
mulss %xmm1, %xmm0
addss %xmm2, %xmm0
ret
No fused multiply/add instruction!
EDIT: this output is produced with the -O3
option
I tried all possible optimization and cpu target options, including -ffast-math
and -march=corei7
to no avail.
EDIT: sorry I made a mistake, I made a typo while trying -mfma
, I thought it was set while it was not. Sorry for having stated erroneously I tried it in the first version of my question
I am missing something elementary ? How can I have GCC generating those mul/add instructions automatically ?
I then thought I had to do that explicitely, so I tried the fmaf() function, but it simply results in a jmp to a lib function, which is even worse !
UPDATE: it looks like, together with -O3
(which however I am always using by default), I have to either set -mfma
or -march=haswell
, for the fma instructions to be generated, which (I could check with some benchmarks) really bring some substantial speed improvement in time critical code, where there are chains of sums and multiplications.
What I don't fully understand is why simply using -march=corei7
or -march=corei7-avx
is not enough. If fma generation was disabled because of the stack alignment bug with MinGW (as somebody mentioned in the comments), then it should be disabled even when specifying -march=haswell
...
Thanks