I have encountered a performance issue in .NET Core 2.1 that I am trying to understand. The code for this can be found here:
https://github.com/mike-eee/StructureActivation
Here is the relavant benchmark code via BenchmarkDotNet:
public class Program
{
static void Main()
{
BenchmarkRunner.Run<Program>();
}
[Benchmark(Baseline = true)]
public uint? Activated() => new Structure(100).SomeValue;
[Benchmark]
public uint? ActivatedAssignment()
{
var selection = new Structure(100);
return selection.SomeValue;
}
}
public readonly struct Structure
{
public Structure(uint? someValue) => SomeValue = someValue;
public uint? SomeValue { get; }
}
From the outset, I would expect Activated
to be faster as it does not store a local variable, which I have always understood to incur a performance penalty to locate and reserve the space within the current stack context to do so.
However, when running the tests, I get the following results:
// * Summary *
BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.285 (1803/April2018Update/Redstone4)
Intel Core i7-4820K CPU 3.70GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.1.402
[Host] : .NET Core 2.1.4 (CoreCLR 4.6.26814.03, CoreFX 4.6.26814.02), 64bit RyuJIT
DefaultJob : .NET Core 2.1.4 (CoreCLR 4.6.26814.03, CoreFX 4.6.26814.02), 64bit RyuJIT
Method | Mean | Error | StdDev | Scaled |
-------------------- |---------:|----------:|----------:|-------:|
Activated | 4.700 ns | 0.0128 ns | 0.0107 ns | 1.00 |
ActivatedAssignment | 3.331 ns | 0.0278 ns | 0.0260 ns | 0.71 |
The activated structure (without storing a local variable) is roughly 30% slower.
For reference, here is the IL courtesy of ReSharper's IL Viewer:
.method /*06000002*/ public hidebysig instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>
Activated() cil managed
{
.custom /*0C00000C*/ instance void [BenchmarkDotNet/*23000002*/]BenchmarkDotNet.Attributes.BenchmarkAttribute/*0100000D*/::.ctor()
= (01 00 01 00 54 02 08 42 61 73 65 6c 69 6e 65 01 ) // ....T..Baseline.
// property bool 'Baseline' = bool(true)
.maxstack 1
.locals /*11000001*/ init (
[0] valuetype StructureActivation.Structure/*02000003*/ V_0
)
// [14 31 - 14 59]
IL_0000: ldc.i4.s 100 // 0x64
IL_0002: newobj instance void valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>/*1B000001*/::.ctor(!0/*unsigned int32*/)/*0A00000F*/
IL_0007: newobj instance void StructureActivation.Structure/*02000003*/::.ctor(valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>)/*06000005*/
IL_000c: stloc.0 // V_0
IL_000d: ldloca.s V_0
IL_000f: call instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32> StructureActivation.Structure/*02000003*/::get_SomeValue()/*06000006*/
IL_0014: ret
} // end of method Program::Activated
.method /*06000003*/ public hidebysig instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>
ActivatedAssignment() cil managed
{
.custom /*0C00000D*/ instance void [BenchmarkDotNet/*23000002*/]BenchmarkDotNet.Attributes.BenchmarkAttribute/*0100000D*/::.ctor()
= (01 00 00 00 )
.maxstack 2
.locals /*11000001*/ init (
[0] valuetype StructureActivation.Structure/*02000003*/ selection
)
// [19 4 - 19 39]
IL_0000: ldloca.s selection
IL_0002: ldc.i4.s 100 // 0x64
IL_0004: newobj instance void valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>/*1B000001*/::.ctor(!0/*unsigned int32*/)/*0A00000F*/
IL_0009: call instance void StructureActivation.Structure/*02000003*/::.ctor(valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32>)/*06000005*/
// [20 4 - 20 31]
IL_000e: ldloca.s selection
IL_0010: call instance valuetype [System.Runtime/*23000001*/]System.Nullable`1/*0100000E*/<unsigned int32> StructureActivation.Structure/*02000003*/::get_SomeValue()/*06000006*/
IL_0015: ret
} // end of method Program::ActivatedAssignment
Upon inspection, Activated
has two newobj
whereas ActivatedAssignment
only has one, which might be contributing to the difference between the two benchmarks.
My question is: is this expected? I am trying to understand why the benchmark with less code is actually slower than the one with more code. Any guidance/recommendations to ensure that I am following best practices would be greatly appreciated.
It's a bit more clear what's happening if you look at the JITted assembly from your methods:
Obviously,
Activated()
is doing more work, and that's why it's slower. What it boils down to is a lot of stack shuffling (all references torsp
). I've commented them as best I could, but theActivated()
method is a bit convoluted because of the redundantmov
s.ActivatedAssigment()
is much more straightforward.Ultimately, you're not actually saving stack space by omitting the local variable. The variable has to exist at some point whether you give it a name or not. The IL code you pasted shows a local variable (they call it
V_0
) which is the temp created by the C# compiler since you didn't create it explicitly.Where the two differ is that the version with the temp variable only reserves a single stack slot (
.maxstack 1
), and it uses it for both theNullable<T>
and theStructure
, hence the shuffling. In the version with the named variable, it reserves 2 slots (.maxstack 2
).Ironically, in the version with the pre-reserved local variable for
selection
, the JIT is able to eliminate the outer structure and deal only with its embeddedNullable<T>
, making for cleaner/faster code.I'm not sure you can deduce any best practices from this example, but I think it's easy enough to see that the C# compiler is the source of the perf difference. The JIT is smart enough to do the right thing with your struct but only if it looks a certain way coming in.