Consider the following C# implementation (not my own) for Levenshtein distance: "Fastenshtein/StaticLevenshtein.cs". I removed the original comments and eliminated some whitespace for brevity. Source - https://github.com/DanHarltey/Fastenshtein/blob/master/src/Fastenshtein/StaticLevenshtein.cs.
// Fastenshtein/StaticLevenshtein.cs
public static int Distance(string value1, string value2)
{
if (value2.Length == 0) { return value1.Length; }
// *** NOTE: "costs" is heap-allocated. ***
int[] costs = new int[value2.Length];
for (int i = 0; i < costs.Length;) { costs[i] = ++i; }
for (int i = 0; i < value1.Length; i++)
{
int cost = i;
int previousCost = i;
char value1Char = value1[i];
for (int j = 0; j < value2.Length; j++)
{
int currentCost = cost;
cost = costs[j];
if (value1Char != value2[j])
{
if (previousCost < currentCost) { currentCost = previousCost; }
if (cost < currentCost) { currentCost = cost; }
++currentCost;
}
costs[j] = currentCost;
previousCost = currentCost;
}
}
return costs[costs.Length - 1];
}
I noticed that the "costs" array was heap-allocated. I wondered, for use cases dealing strictly with small strings (32 characters or less), if an implementation using a stack-allocated "costs" array would improve performance.
For my stack-allocated "costs" implementation, I simply replaced int[] costs = new int[value2.Length]; with Span<int> costs = stackalloc int[value2.Length];:
// Stack-allocated "costs"
public static int Distance_StackAlloc(string value1, string value2)
{
if (value2.Length == 0) { return value1.Length; }
// *** NOTE: "costs" is now stack-allocated. ***
Span<int> costs = stackalloc int[value2.Length];
// ... remaining code is identical to the original, and
// is omitted here for brevity.
}
To compare performance, I used BenchmarkDotNet (https://benchmarkdotnet.org/index.html). I loaded 1,000 random words from the following text file into a string[], and measured the time to sum the Levenshtein distance from each to the word "foofaraw" - https://github.com/dwyl/english-words/blob/master/words_alpha.txt. Here is the setup code:
[Params(1000)] public int N;
[Params("foofaraw")] public string Word;
private string[] _words;
[GlobalSetup]
public void Setup()
{
// Same seed so each test uses the same data.
Random rng = new Random(2112340164);
_words =
File
.ReadAllLines("words_alpha.txt") // shortened path
.OrderBy(_ => rng.Next())
.Take(N)
.ToArray();
}
And here is the pseudocode for each Benchmark:
[Benchmark]
public int Test()
{
int sum = 0;
foreach (string word in _words) { sum += Levenshtein(Word, word); }
return sum;
}
Surprisingly, the heap-allocated version seems to outperform the stack-allocated version! Below are the results from BenchmarkDotNet. Some observations:
The heap-allocated version allocates 63,840 bytes across 1,000 invocations.
The stack-allocated version has 50% more branch mispredictions.
N: 1000 Word: "foofaraw"
| Method | Mean | Mispredictions/Op | Instructions/Op | CacheMisses/Op | Alloc |
|---|---|---|---|---|---|
| StaticLevenshtein | 180.3 μs | 10,191 | 361,551 | 978 | 63840 B |
| StackAllocCosts | 199.0 μs | 15,368 | 335,321 | 37 | - |
Can anyone shed some light on the results?
Edit: the following question and its answers seem to suggest the behavior that I would expect to see. I posted this question because my benchmark results do not follow what I would expect. Which is faster: Stack allocation or Heap allocation
Edit: Here are the remaining statistics from the BenchmarkDotNet results:
| Method | Mean | Error | StdDev | Median |
|------------------ |---------:|--------:|--------:|---------:|
| StaticLevenshtein | 180.3 μs | 3.50 μs | 5.45 μs | 177.6 μs |
| StackAllocCosts | 199.0 μs | 3.24 μs | 4.10 μs | 198.4 μs |