Consider the following C# implementation (not my own) for Levenshtein distance: "Fastenshtein/StaticLevenshtein.cs". I removed the original comments and eliminated some whitespace for brevity. Source - https://github.com/DanHarltey/Fastenshtein/blob/master/src/Fastenshtein/StaticLevenshtein.cs.

// Fastenshtein/StaticLevenshtein.cs
public static int Distance(string value1, string value2)
{
  if (value2.Length == 0) { return value1.Length; }

  // *** NOTE: "costs" is heap-allocated. ***
  int[] costs = new int[value2.Length];
  
  for (int i = 0; i < costs.Length;) { costs[i] = ++i; }

  for (int i = 0; i < value1.Length; i++)
  {
    int cost = i;
    int previousCost = i;
    char value1Char = value1[i];

    for (int j = 0; j < value2.Length; j++)
    {
      int currentCost = cost;
      cost = costs[j];

      if (value1Char != value2[j])
      {
        if (previousCost < currentCost) { currentCost = previousCost; }
        if (cost < currentCost) { currentCost = cost; }
        ++currentCost;
      }

      costs[j] = currentCost;
      previousCost = currentCost;
    }
  }

  return costs[costs.Length - 1];
}

I noticed that the "costs" array was heap-allocated. I wondered, for use cases dealing strictly with small strings (32 characters or less), if an implementation using a stack-allocated "costs" array would improve performance.

For my stack-allocated "costs" implementation, I simply replaced int[] costs = new int[value2.Length]; with Span<int> costs = stackalloc int[value2.Length];:

// Stack-allocated "costs"
public static int Distance_StackAlloc(string value1, string value2)
{
  if (value2.Length == 0) { return value1.Length; }

  // *** NOTE: "costs" is now stack-allocated. ***
  Span<int> costs = stackalloc int[value2.Length];

  // ... remaining code is identical to the original, and 
  // is omitted here for brevity.
}

To compare performance, I used BenchmarkDotNet (https://benchmarkdotnet.org/index.html). I loaded 1,000 random words from the following text file into a string[], and measured the time to sum the Levenshtein distance from each to the word "foofaraw" - https://github.com/dwyl/english-words/blob/master/words_alpha.txt. Here is the setup code:

[Params(1000)] public int N;
[Params("foofaraw")] public string Word;
private string[] _words;

[GlobalSetup]
public void Setup()
{
  // Same seed so each test uses the same data.
  Random rng = new Random(2112340164);
  _words =
    File
    .ReadAllLines("words_alpha.txt") // shortened path
    .OrderBy(_ => rng.Next())
    .Take(N)
    .ToArray();
}

And here is the pseudocode for each Benchmark:

[Benchmark]
public int Test()
{
  int sum = 0;
  foreach (string word in _words) { sum += Levenshtein(Word, word); }
  return sum;
}

Surprisingly, the heap-allocated version seems to outperform the stack-allocated version! Below are the results from BenchmarkDotNet. Some observations:

  • The heap-allocated version allocates 63,840 bytes across 1,000 invocations.

  • The stack-allocated version has 50% more branch mispredictions.

    N: 1000 Word: "foofaraw"

Method Mean Mispredictions/Op Instructions/Op CacheMisses/Op Alloc
StaticLevenshtein 180.3 μs 10,191 361,551 978 63840 B
StackAllocCosts 199.0 μs 15,368 335,321 37 -

Can anyone shed some light on the results?


Edit: the following question and its answers seem to suggest the behavior that I would expect to see. I posted this question because my benchmark results do not follow what I would expect. Which is faster: Stack allocation or Heap allocation


Edit: Here are the remaining statistics from the BenchmarkDotNet results:

| Method            | Mean     | Error   | StdDev  | Median   |
|------------------ |---------:|--------:|--------:|---------:|
| StaticLevenshtein | 180.3 μs | 3.50 μs | 5.45 μs | 177.6 μs |
| StackAllocCosts   | 199.0 μs | 3.24 μs | 4.10 μs | 198.4 μs |
0

There are 0 best solutions below