Why is this ECS benchmark not showing the expected performance improvements despite using CPU cache?

154 Views Asked by At

I am trying to implement an Entity Component System (ECS) in C# using structs or arrays, but the performance is not that much better than using classes and objects. Despite utilizing techniques such as CPU caching and data locality, the BenchmarkDotNet results are not showing the expected improvement.

Regarding the underwhelming results, I wonder if I am doing something wrong or whether the design just has less impact with today's hardware and software.

BenchmarkDotNet=v0.13.4, OS=Windows 11 (10.0.22621.963)
AMD Ryzen 5 5600X, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.102
  [Host]     : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2


|          Method |     Mean |   Error |  StdDev |
|---------------- |---------:|--------:|--------:|
|         Structs | 128.2 us | 0.83 us | 0.78 us |
|         Classes | 122.5 us | 0.17 us | 0.15 us |
| ComponentArrays | 203.6 us | 0.53 us | 0.49 us |
internal class Program
{

    static void Main(string[] args)
    {
        BenchmarkRunner.Run<Benchmark>();
    }
}

struct StructEntity
{
    public int Age;
    public Vector3 Position;
    public float Health;
}

class ClassEntity
{
    public int Age;
    public Vector3 Position;
    public float Health;
}

public class Benchmark
{
    private readonly StructEntity[] _structs;
    private readonly ClassEntity[] _classes;

    private readonly int[] _ageComponents;
    private readonly Vector3[] _positionComponents;
    private readonly float[] _healthComponents;

    private const int size = 50000;
    private static Random random = new();
    
    public Benchmark()
    {
        _structs = new StructEntity[size];
        _classes = new ClassEntity[size];

        _ageComponents = new int[size];
        _positionComponents = new Vector3[size];
        _healthComponents = new float[size];


        for (var i = 0; i < _structs.Length; i++)
        {
            var age = random.Next(1, 100);
            var health = (float)random.NextDouble();
            var position = new Vector3((float)random.NextDouble(), (float)random.NextDouble(), (float)random.NextDouble());

            // structs
            var structEntity = new StructEntity();
            structEntity.Age = age;
            structEntity.Health = health;
            structEntity.Position = position;
            _structs[i] = structEntity;

            // classes
            var classEntity = new ClassEntity();
            _classes[i] = classEntity;
            classEntity.Age = age;
            classEntity.Health = health;
            classEntity.Position = position;

            // component arrays
            _healthComponents[i] = health;
            _ageComponents[i] = age;
            _positionComponents[i] = position;
        }
    }

    [Benchmark]
    public int Structs()
    {
        int count = 0;
        
        for (var i = 0; i < size; i++)
        {
            ref var structEntity = ref _structs[i];
            if (structEntity.Age > 30 && structEntity.Health < 0.5)
            {
                count++;
                structEntity.Position = new Vector3(structEntity.Age, 101, structEntity.Age * 2);
                structEntity.Age *= 3;
                structEntity.Health *= 3;
            }
        }
        return count;
    }
    
    [Benchmark]
    public int Classes()
    {
        int count = 0;
        for (var i = 0; i < size; i++)
        {
            var classEntity = _classes[i];
            if (classEntity.Age > 30 && classEntity.Health < 0.5)
            {
                count++;
                classEntity.Position = new Vector3(classEntity.Age, 101, classEntity.Age * 2);
                classEntity.Age *= 3;
                classEntity.Health *= 3;
            }
        }
        return count;
    }

    [Benchmark]
    public int ComponentArrays()
    {
        int count = 0;
        for (var i = 0; i < size; i++)
        {
            ref Vector3 position = ref _positionComponents[i];
            ref int age = ref _ageComponents[i];
            ref float health = ref _healthComponents[i];

            if (age > 30 && health < 0.5 && position.X < position.Z)
            {
                count++;
                position = new Vector3(age, 101, age * 2);
                age *= 3;
                health *= 3;
            }
        }
        return count;
    }
}
1

There are 1 best solutions below

0
On

A lot of the benefits of an ECS and, more generally speaking, SoA rep -- even for sequential memory access patterns (unless you're wasting a lot of memory with bigger strides between entities for proper alignment) -- is in its ability to access small parts of the conceptual entity, not the whole.

For example, a rigid body physics system might only be concerned with convex hulls and motion components when you have a scene that has a boatload of other stuff like meshes and textures and whatnot. The AoS rep in that case would suffer in that at least parts of the boatload of other stuff would be loaded into the CPU cache only to not be used with a massive stride to get from one entity's components to the next, loading a lot of data into a cache line that's irrelevant for the operation at hand. In a real-world example, you might have 40+ component types and a system only needs to access 2 or 3 of them at once, and that's where you see the performance improvements. You aren't going to see much if you have just a total of a few component types and access all three in a system.

I need a more careful examination of your code but I'm guessing you are comparing AoS to SoA where you access all the fields of what was the AoS in both. And you're using an int for the first field which I suspect is a 32-bit signed integer followed by a Vector3 which I suspect is 3 32-bit SPFPs, and then finally float for health which I suspect in C# is a 32-bit SPFP.

So there's no padding required there for proper alignment: the structure is 128-bits and all fields only require 32-bit alignment and you're accessing all of them, so there's no sequential access benefit from a mem use and stride standpoint to the SoA except that the optimizer might be able to emit more efficient SIMD code (which is dubious, BTW, optimizing compilers are like wizards in some areas but dumb in others, and you rarely get that much benefit from vectorization unless you hand-write your own SIMD instructions).

So think of it that way. If you're comparing an AoS rep to an SoA rep for sequential access where you use everything in the AoS in the SoA, you're generally only going to get improvements if your optimizer can emit more efficient SIMD code as a result or if it reduces the stride by eliminating the padding required for alignment of non-homogeneous types. In a nutshell, I think you are not benchmarking the real benefits of an ECS in a real-world use case if you'll forgive my blunt observation.