Allocation-free enumeration and processing

341 Views Asked by At

I would like to solve the massive allocation costs of a c# application. The application itself can be represented by the TickUser class at the bottom, and I'd like to know how to implement the TickStream object and the DoWork and ProcessTick methods to use allocation-free data.

using System;
using System.Collections.Generic;

namespace Ticks {

    public interface ITick {
        double Price { get; }
        double Bid { get; }
        double Ask { get; }
        double Volume { get; }
        DateTime TimeStamp { get; }
    }
    /// <summary>
    /// Allows allocation-free creation of an <see cref="ITick"/>, 
    /// but is inefficient to pass from method to method because it is large.
    /// </summary>
    public readonly struct Tick : ITick {
        public double Price { get; }
        public double Bid { get; }
        public double Ask { get; }
        public double Volume { get; }
        public DateTime TimeStamp { get; }
        public Tick(double price, double bid, double ask, double volume, DateTime timeStamp) {
            Price = price;
            Bid = bid;
            Ask = ask;
            Volume = volume;
            TimeStamp = timeStamp;
        }
    }
    /// <summary>
    /// Also allows allocation-free creation of an <see cref="ITick"/>,
    /// but this is efficient to pass from method to method because it is small,
    /// containing only a single pointer.
    /// It requires a <see cref="Tick"/> to be created once and guaranteed "pinned" to a location on the stack.
    /// </summary>
    public unsafe readonly struct TickPointer : ITick {

        readonly Tick* Ptr;

        public double Price => Ptr->Price;
        public double Bid => Ptr->Bid;
        public double Ask => Ptr->Ask;
        public double Volume => Ptr->Volume;
        public DateTime TimeStamp => Ptr->TimeStamp;

        public TickPointer(Tick* ptr) {
            Ptr = ptr;
        }
    }
    /// <summary>
    /// Generates the data stream.
    /// </summary>
    public class TickStream {

        /// <summary>
        /// Typically returns hundreds of millions of <see cref="ITick"/> values.
        /// To avoid massive allocation/deallocation costs, we want to yield objects that
        /// can only exist on the stack.
        /// </summary>
        public IEnumerable<ITick> GetTicks() {
            // How to implement??
        }
    }

    public class TickUser {

        public void DoWork() {
            var stream = new TickStream();
            foreach (var tick in stream.GetTicks()) {
                ProcessTick(tick);
            }
        }

        void ProcessTick(ITick tick) {

        }
    }
}

Inside the TickStream class, I'm ok with removing the IEnumerable<Tick> GetTicks() method and replacing it with the MoveNext() / Current { get; } / Current()pattern.

Perhaps ref readonly TickPointer Current()?

1

There are 1 best solutions below

0
On

Ok, I figured it out and performed tests with a number of alternate scenarios. It turns out that creating the tick as a ref readonly struct gives best performance in all considerations if one is careful to always pass it around using the in parameter.

Basic code

using System;

namespace Ticks {

    /// <summary>
    /// Allows allocation-free creation of an <see cref="ITick"/>, but is inefficient to pass from method to method because it is large.
    /// </summary>
    public readonly ref struct Tick {
        public static Tick Empty => new Tick(0, 0, 0, 0, DateTime.MinValue);
        public bool IsEmpty => Price == 0;
        public double Price { get; }
        public double Bid { get; }
        public double Ask { get; }
        public double Volume { get; }
        public DateTime TimeStamp { get; }
        public Tick(double price, double bid, double ask, double volume, DateTime timeStamp) {
            Price = price;
            Bid = bid;
            Ask = ask;
            Volume = volume;
            TimeStamp = timeStamp;
        }
    }
    /// <summary>
    /// Also allows allocation-free creation of an <see cref="ITick"/>, but this is efficient to pass from method to method because it is small,
    /// containing only a single pointer.
    /// It requires a <see cref="Tick"/> to be created once and guaranteed "pinned" to a location on the stack.
    /// </summary>
    public unsafe readonly ref struct TickPointer {

        readonly Tick* Ptr;

        public bool IsEmpty => Ptr->IsEmpty;
        public double Price => Ptr->Price;
        public double Bid => Ptr->Bid;
        public double Ask => Ptr->Ask;
        public double Volume => Ptr->Volume;
        public DateTime TimeStamp => Ptr->TimeStamp;

        public TickPointer(Tick* ptr) {
            Ptr = ptr;
        }
    }

    /// <summary>
    /// A reference-type implementation of the tick data for tests involving allocations.
    /// </summary>
    sealed class AllocatedTick {
        public readonly double Price;
        public readonly double Bid;
        public readonly double Ask;
        public readonly double Volume;
        public readonly DateTime TimeStamp;
        public AllocatedTick(double price, double bid, double ask, double volume, DateTime timeStamp) {
            Price = price;
            Bid = bid;
            Ask = ask;
            Volume = volume;
            TimeStamp = timeStamp;
        }
    }
}

Tick stream code

sealed class TestTickStream {

    public readonly int NumTicks;

    readonly DateTime DummyTime = DateTime.MinValue;

    int count = 0;

    public TestTickStream(int numTicks) {
        NumTicks = numTicks;
    }

    public bool MoveNext(ref Tick tick) {
        if (++count > NumTicks) return false;
        tick = new Tick(count, count, count, count, DummyTime);
        return true;
    }
}

/// <summary>
/// A stream that returns <see cref="AllocatedTick"/> ticks for tests involving allocations
/// </summary>
sealed class AllocatedTickStream {

    public readonly int NumTicks;

    readonly DateTime DummyTime = DateTime.MinValue;

    int count = 0;

    public AllocatedTickStream(int numTicks) {
        NumTicks = numTicks;
    }

    public AllocatedTick MoveNext() {
        if (++count > NumTicks) return null;
        return new AllocatedTick(count, count, count, count, DummyTime);
    }
}

Benchmarking code

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using Ticks;

namespace TIcks.Benchmarks {

    [MemoryDiagnoser]
    public class AllocationAndPassingBenchmarks {

        /// <summary>
        /// The number of ticks a stream should generate.
        /// </summary>
        [Params(1000, 100000, 10000000)]
        public int NumTicks { get; set; }

        /// <summary>
        /// The depth of methods that each tick should be processed through. To simulate methods calling methods.
        /// </summary>
        [Params(1, 10, 100)]
        public int MethodPassingDepth { get; set; }

        [Benchmark]
        public void TickPassedWithInParameter() {
            var stream = new TestTickStream(NumTicks);
            var tick = Tick.Empty;
            while (stream.MoveNext(ref tick)) {
                var x = Process(tick, 0);
            }

            Tick Process(in Tick tick, int depth) {
                return depth == MethodPassingDepth ? tick : Process(tick, depth + 1);
            }
        }

        [Benchmark]
        public void TickPassedWithNothing() {
            var stream = new TestTickStream(NumTicks);
            var tick = Tick.Empty;
            while (stream.MoveNext(ref tick)) {
                var x = Process(tick, 0);
            }

            Tick Process(Tick tick, int depth) {
                return depth == MethodPassingDepth ? tick : Process(tick, depth + 1);
            }
        }

        [Benchmark]
        public unsafe void TickPassedAsPointer() {
            var stream = new TestTickStream(NumTicks);
            var tick = Tick.Empty;
            while (stream.MoveNext(ref tick)) {
                var tickPointer = new TickPointer(&tick);
                var x = Process(tickPointer, 0);
            }

            TickPointer Process(TickPointer tickPointer, int depth) {
                return depth == MethodPassingDepth ? tickPointer : Process(tickPointer, depth + 1);
            }
        }

        [Benchmark]
        public void AllocatedTicks() {
            var stream = new AllocatedTickStream(NumTicks);
            for (var tick = stream.MoveNext(); tick is object; tick = stream.MoveNext()) {
                var x = Process(tick, 0);
            }

            AllocatedTick Process(AllocatedTick tick, int depth) {
                return depth == MethodPassingDepth ? tick : ProcessSlowTick(tick, depth + 1);
            }
        }
    }

    [MemoryDiagnoser]
    public class MemberAccessBenchmarks {

        [Params(100, 1000)]
        public int AccessCount { get; set; }

        [Benchmark]
        public void AccessTickMembers() {
            double price, bid, ask, volume;
            DateTime time;
            var tick = new Tick(1, 1, 1, 1, DateTime.Now);
            for (var i = 0; i < AccessCount; i++) {
                price = tick.Price;
                bid = tick.Bid;
                ask = tick.Ask;
                volume = tick.Volume;
                time = tick.TimeStamp;
            }
        }

        [Benchmark]
        public unsafe void AccessTickPointerMembers() {
            double price, bid, ask, volume;
            DateTime time;
            var tick = new Tick(1, 1, 1, 1, DateTime.Now);
            var tickPointer = new TickPointer(&tick);
            for (var i = 0; i < AccessCount; i++) {
                price = tickPointer.Price;
                bid = tickPointer.Bid;
                ask = tickPointer.Ask;
                volume = tickPointer.Volume;
                time = tickPointer.TimeStamp;
            }
        }
    }

    class Program {
        static void Main(string[] args) {
            var summary1 = BenchmarkRunner.Run<AllocationAndPassingBenchmarks>();
            var summary2 = BenchmarkRunner.Run<MemberAccessBenchmarks>();
        }
    }

}

Member access results (used to determine whether there accessing members via TickPointer is too slow, show that its best not to use the TickPointer idea unless absolutely necessary)


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-6820HK CPU 2.70GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
|                   Method | AccessCount |           Mean |        Error |       StdDev |     Gen 0 |     Gen 1 |     Gen 2 |     Allocated |
|------------------------- |------------ |---------------:|-------------:|-------------:|----------:|----------:|----------:|--------------:|
|    **AccessTickMembers** |     **100** |   **374.1 ns** |  **3.96 ns** |  **3.51 ns** |     **-** |     **-** |     **-** |         **-** |
| AccessTickPointerMembers |       100   |     412.0 ns   |    8.64 ns   |    8.88 ns   |         - |         - |         - |             - |
|    **AccessTickMembers** |    **1000** |   **728.5 ns** | **14.41 ns** | **17.70 ns** |     **-** |     **-** |     **-** |         **-** |
| AccessTickPointerMembers |      1000   |   1,013.0 ns   |   19.93 ns   |   22.95 ns   |         - |         - |         - |             - |

Allocation and passing around results (used to determine the best way to pass ticks from method to method as well as rapidly consume hundreds of millions of them from a stream without high allocations)


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-6820HK CPU 2.70GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.101
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
|                        Method |     NumTicks | MethodPassingDepth |                Mean |     Allocated |
|------------------------------ |------------- |------------------- |--------------------:|--------------:|
| **TickPassedWithInParameter** |     **1000** |              **1** |        **13.78 us** |      **32 B** |
|     TickPassedWithNothing     |       1000   |                1   |          17.80 us   |        32 B   |
|       TickPassedAsPointer     |       1000   |                1   |          12.81 us   |        32 B   |
|            AllocatedTicks     |       1000   |                1   |          15.59 us   |     56032 B   |
| **TickPassedWithInParameter** |     **1000** |             **10** |        **17.64 us** |      **32 B** |
|     TickPassedWithNothing     |       1000   |               10   |          44.36 us   |        32 B   |
|       TickPassedAsPointer     |       1000   |               10   |          35.65 us   |        32 B   |
|            AllocatedTicks     |       1000   |               10   |          18.72 us   |     56032 B   |
| **TickPassedWithInParameter** |     **1000** |            **100** |        **64.16 us** |      **32 B** |
|     TickPassedWithNothing     |       1000   |              100   |         368.49 us   |        32 B   |
|       TickPassedAsPointer     |       1000   |              100   |         257.56 us   |        32 B   |
|            AllocatedTicks     |       1000   |              100   |          72.04 us   |     56032 B   |
| **TickPassedWithInParameter** |   **100000** |              **1** |     **1,349.07 us** |      **35 B** |
|     TickPassedWithNothing     |     100000   |                1   |       1,732.55 us   |        41 B   |
|       TickPassedAsPointer     |     100000   |                1   |       1,232.51 us   |        41 B   |
|            AllocatedTicks     |     100000   |                1   |       1,579.70 us   |   5600032 B   |
| **TickPassedWithInParameter** |   **100000** |             **10** |     **1,705.52 us** |      **32 B** |
|     TickPassedWithNothing     |     100000   |               10   |       4,446.53 us   |        32 B   |
|       TickPassedAsPointer     |     100000   |               10   |       3,612.70 us   |        51 B   |
|            AllocatedTicks     |     100000   |               10   |       1,927.15 us   |   5600032 B   |
| **TickPassedWithInParameter** |   **100000** |            **100** |     **6,190.00 us** |      **32 B** |
|     TickPassedWithNothing     |     100000   |              100   |      36,278.23 us   |        32 B   |
|       TickPassedAsPointer     |     100000   |              100   |      23,599.96 us   |        32 B   |
|            AllocatedTicks     |     100000   |              100   |       7,215.91 us   |   5600032 B   |
| **TickPassedWithInParameter** | **10000000** |              **1** |   **133,903.50 us** |      **92 B** |
|     TickPassedWithNothing     |   10000000   |                1   |     174,813.80 us   |        32 B   |
|       TickPassedAsPointer     |   10000000   |                1   |     125,981.61 us   |       342 B   |
|            AllocatedTicks     |   10000000   |                1   |     142,317.59 us   | 560000032 B   |
| **TickPassedWithInParameter** | **10000000** |             **10** |   **173,391.13 us** |    **2056 B** |
|     TickPassedWithNothing     |   10000000   |               10   |     432,315.53 us   |        32 B   |
|       TickPassedAsPointer     |   10000000   |               10   |     361,120.79 us   |        32 B   |
|            AllocatedTicks     |   10000000   |               10   |     192,561.14 us   | 560000032 B   |
| **TickPassedWithInParameter** | **10000000** |            **100** |   **615,150.00 us** |  **1368 B**   |
|     TickPassedWithNothing     |   10000000   |              100   |     3,792,146.58 us |        32 B   |
|       TickPassedAsPointer     |   10000000   |              100   |     2,520,775.19 us |        32 B   |
|            AllocatedTicks     |   10000000   |              100   |       685,623.22 us | 560000032 B   |

In summary, here is the best way I found to code the tick:

public readonly ref struct Tick {
    public double Price { get; }
    public double Bid { get; }
    public double Ask { get; }
    public double Volume { get; }
    public DateTime TimeStamp { get; }
    public Tick(double price, double bid, double ask, double volume, DateTime timeStamp) {
        Price = price;
        Bid = bid;
        Ask = ask;
        Volume = volume;
        TimeStamp = timeStamp;
    }
}

Here is the best way I found to code the stream:

sealed class TestTickStream {

    public readonly int NumTicks;

    readonly DateTime DummyTime = DateTime.MinValue;

    int count = 0;

    public TestTickStream(int numTicks) {
        NumTicks = numTicks;
    }

    public bool MoveNext(ref Tick tick) {
        if (++count > NumTicks) return false;
        tick = new Tick(count, count, count, count, DummyTime);
        return true;
    }
}

Here is the best way that I found to consume the stream:

public void ProcessAllTicks() {
    var stream = new TestTickStream(NumTicks);
    var tick = new Tick();
    while (stream.MoveNext(ref tick)) {
        Process(tick);
    }

    void Process(in Tick tick) {
        // do semething
    }
}

If you can show some ways to improve this code, please share!