I created a simple benchmark out of curiosity, but cannot explain the results.
As benchmark data, I prepared an array of structs with some random values. The preparation phase is not benchmarked:
struct Val
{
public float val;
public float min;
public float max;
public float padding;
}
const int iterations = 1000;
Val[] values = new Val[iterations];
// fill the array with randoms
Basically, I wanted to compare these two clamp implementations:
static class Clamps
{
public static float ClampSimple(float val, float min, float max)
{
if (val < min) return min;
if (val > max) return max;
return val;
}
public static T ClampExt<T>(this T val, T min, T max) where T : IComparable<T>
{
if (val.CompareTo(min) < 0) return min;
if (val.CompareTo(max) > 0) return max;
return val;
}
}
Here are my benchmark methods:
[Benchmark]
public float Extension()
{
float result = 0;
for (int i = 0; i < iterations; ++i)
{
ref Val v = ref values[i];
result += v.val.ClampExt(v.min, v.max);
}
return result;
}
[Benchmark]
public float Direct()
{
float result = 0;
for (int i = 0; i < iterations; ++i)
{
ref Val v = ref values[i];
result += Clamps.ClampSimple(v.val, v.min, v.max);
}
return result;
}
I'm using BenchmarkDotNet version 0.10.12 with two jobs:
[MonoJob]
[RyuJitX64Job]
And these are the results I get:
BenchmarkDotNet=v0.10.12, OS=Windows 7 SP1 (6.1.7601.0)
Intel Core i7-6920HQ CPU 2.90GHz (Skylake), 1 CPU, 8 logical cores and 4 physical cores
Frequency=2836123 Hz, Resolution=352.5940 ns, Timer=TSC
[Host] : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3062.0
Mono : Mono 5.12.0 (Visual Studio), 64bit
RyuJitX64 : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3062.0
Method | Job | Runtime | Mean | Error | StdDev |
---------- |---------- |-------- |----------:|----------:|----------:|
Extension | Mono | Mono | 10.860 us | 0.0063 us | 0.0053 us |
Direct | Mono | Mono | 11.211 us | 0.0074 us | 0.0062 us |
Extension | RyuJitX64 | Clr | 5.711 us | 0.0014 us | 0.0012 us |
Direct | RyuJitX64 | Clr | 1.395 us | 0.0056 us | 0.0052 us |
I can accept that Mono is somewhat slower here in general. But what I don't understand is:
Why does Mono run the Direct
method slower than Extension
keeping in mind that Direct
uses a very simple comparison method whereas Extension
uses a method with additional method calls?
RyuJIT shows here a 4x advantage of the simple method.
Can anyone explain this?
Since nobody wanted to do some disassembly stuff, I answer my own question.
It seems that the reason is the native code being generated by the JITs, not the array boundary checking or caching issues mentioned in the comments.
RyuJIT generates a very efficient code for the
ClampSimple
method:It uses the CPU's native
ucomiss
operations to comparefloat
s and also fastmovaps
operations to move thosefloat
s between CPU's registers.The extension method is slower because it has a couple of function calls to
System.Single.CompareTo(System.Single)
, here's the first branch:Let's have a look at the native code Mono produces for the
ClampSimple
method:Mono's code converts
floats
todouble
s and compares them usingcomisd
. Furthermore, there are strange "convert flips"float
➞double
➞float
when preparing the return value. And also there is much more moving around between memory and registers. This explains why Mono's code for the simple method is slower compared to RyuJIT's one.The
Extension
method code is very similar to the RyuJIT's code, but again with strange converting flipsfloat
➞double
➞float
:It seems that RyuJIT can generate more efficient code for handling
float
s. Mono treatsfloat
s asdouble
s and converts the values each time, which also causes additional value transfers between CPU registers and memory.Note that all this is valid for Windows x64 only. I don't know how this benchmark will perform on Linux or Mac.