So we want to use Intel C compiler with runtime CPU dispatching enabled (this is on Windows platform). We use the options /arch:IA32 plus /QaxSSE2, but no /QxFoo option. This should - to our understanding - produce a binary that runs on any IA32 (x86) processor, but still uses SSE2-optimized code path on processors that actually support SSE2 instruction set.
However, testing reveals that on the processor without SSE2 support (e.g. Pentium III) the binary will crash with "illegal instruction" exception! Interestingly, removing only /QaxSSE2, and leaving anything else as-is, produces a binary that works perfectly fine on the processor without SSE2 support.
Another interesting observation is: Using /arch:IA32 plus /QaxSSE2 together with /Ob0 (disables inlining!) produces a binary that also works perfectly fine on the processor without SSE2 support.
At this point it would seem that either runtime CPU dispatching raises the CPU requirement of the "base" code path to SSE2, regardless of /arch:IA32 option. Or that function inlining and runtime CPU dispatching don't go together. But we fail to find any mention of this in the Intel documentation. This is very important information, so we think this would need to be mentioned in the documentation!
Can anybody confirm the observation or clarify what's going on?
Thank you!