GCC 12 (minGW 64): how to enable fused multiply add code generation

177 Views Asked by At

I apologize in advance in case the answer to my question is obvious but trust me, I have been googling the whole day and searched here aswell without finding anything relevant to it.

I am using GCC 12 (minGW x64) on my x64 windows i7 setup. I don't seem to manage to have GCC generating any float multiply-add instructions.

The simplest case:

float func(float a, float b, float c)
{
   return a*b+c;
}

produces this assembly:

mulss %xmm1, %xmm0
addss %xmm2, %xmm0
ret

No fused multiply/add instruction!

EDIT: this output is produced with the -O3 option

I tried all possible optimization and cpu target options, including -ffast-math and -march=corei7 to no avail.

EDIT: sorry I made a mistake, I made a typo while trying -mfma, I thought it was set while it was not. Sorry for having stated erroneously I tried it in the first version of my question

I am missing something elementary ? How can I have GCC generating those mul/add instructions automatically ?

I then thought I had to do that explicitely, so I tried the fmaf() function, but it simply results in a jmp to a lib function, which is even worse !

UPDATE: it looks like, together with -O3 (which however I am always using by default), I have to either set -mfma or -march=haswell, for the fma instructions to be generated, which (I could check with some benchmarks) really bring some substantial speed improvement in time critical code, where there are chains of sums and multiplications. What I don't fully understand is why simply using -march=corei7 or -march=corei7-avx is not enough. If fma generation was disabled because of the stack alignment bug with MinGW (as somebody mentioned in the comments), then it should be disabled even when specifying -march=haswell...

Thanks

0

There are 0 best solutions below