I'm currently experiencing random floating point errors when compiling for x86 targets using VC++ 11 (CTP Update 1). See the short example "test.cpp" below, and compile using:
cl /GL /O2 /EHsc test.cpp /link /MACHINE:X86
The output should be 10 == 10, but it produces 10 == 0 when /GL (whole program optimization) is enabled. The problem seems to be that get_scaling_factor() pushes the result on the floating point stack, but the calling function is expecting it in the SSE register XMM0.
Question: am I missing something obvious, or is this really a bug? The test program, of course, doesn't make sense, as it is a stripped down test case.
test.cpp:
#include <iostream>
template <typename T>
inline T get_scaling_factor(int units)
{
    switch (units)
    {
    case 0: return 1;  
    case 1: return 10;  
    case 2: return 100;  
    case 3: return 1000;  
    case 4: return 10000;  
    case 5: return 100000;  
    case 6: return 1000000;  
    case 7: return 10000000;  
    case 8: return 100000000;  
    case 9: return 1000000000; 
    default: return 1;
    }
}
template <int targetUnits, typename T>
inline T scale(T value, int sourceUnits)
{
    return value   * get_scaling_factor<T>(sourceUnits) 
                   / get_scaling_factor<T>(targetUnits);
}
__declspec(noinline)
double scale(double value, int units) 
{
    return scale<9>(value, units);
}
int main()
{
    std::cout << "10 = " << scale(1e9, 1) << std::endl;
}
Update
Issue confirmed by Microsoft. It even affects straight forward code like this:
#include <stdio.h>
double test(int a)
{
    switch (a)
    {
    case 0: return 1.0;
    case 1: return 10.0;
    case 2: return 100.0;
    case 3: return 1000.0;
    case 4: return 10000.0;
    case 5: return 100000.0;
    case 6: return 1000000.0;
    case 7: return 10000000.0;
    case 8: return 100000000.0;
    case 9: return 1000000000.0;
    default: return 1.0;
    }
}
void main()
{
    int nine = 9;
    double x = test(nine);
    x /= test(7);
    int val = (int)x;
    if (val == 100)
        printf("pass");
    else 
        printf("fail, val is %d", val);
}
 
                        
Yes, this is definitely a code optimizer bug and I had no trouble reproducing it. Optimizer bugs are usually associated with inlining but that's not the case here. This bug got introduced by the heavy code-gen changes in VS2012 that support the new auto-vectorizing feature.
In a nutshell, the get_scaling_factor() function returns the result on the FPU stack. The code generator properly emits the instruction to retrieve it from the stack and store it in an XMM register. But the optimizer inappropriate removes that code entirely, as though it assumes that the function result was already stored in XMM0.
A workaround is hard to come by, specializing the template function for double has no effect. Disabling optimization with #pragma optimize works:
Your repro code is very good and Microsoft will have no trouble fixing this bug from this. You can file a feedback report at connect.microsoft.com, just link to this question. Or if you are in a hurry then you can contact Microsoft Support although I'd imagine they'll give you the same workaround to last you to the service pack.
UPDATE: fixed in VS2013.