I’m testing the capabilities of the .Net C# System.Numerics.Vector class for packing and unpacking bits.
I was hoping for Vector bitwise shift left/right functionality but that is not currently available so I tried to simulate shifting using arithmetic & logical methods as below. Here’s what I saw:
Packing (a simulated bitwise SHIFT LEFT and OR) using Vector.Multiply() and Vector.BitwiseOr() is slightly worse* than array/pointer code.
*<10% degradation in throughput (MB/sec).
But Unpacking (a simulated bitwise SHIFT RIGHT and AND) using Vector.Divide() and Vector.BitwiseAnd() is far worse** than array/pointer code.
**50% degradation in throughput
NB:
Vector was tested using unit (this was also raised in comments).
Test basis was the packing & unpacking of 100Mn up to 1Bn integers in blocks of 65536 integers. I randomly generated the int[] for each block.
I also tested bitwise (& | >> <<) as well as arithmetic (+ - * /) operations and saw no marked difference in cost. Even divide was not that bad with only a 10% degradation in throughout vs multiply (the question of division was raised in comments)
I changed my original test code (for the non-Vector comparison) to an unsafe/pointer routine to create more of a like-for-like test in terms of packing (many integers to a word) versus unpacking (a word to many integers). This brought the difference in throughout (between packing & unpacking) for the non-Vector code down to a variance of <5%. (which counters my comment about the compiler and optimization below)
Non-Optimized Vector: packing is 2x as fast as unpacking
Optimized Vector: yielded a 4x improvement (versus non-optimized Vector) in packing and a 2x improvement for unpacking
Non-Optimized array/pointer: unpacking is ~5% faster than packing
Optimized array/pointer: yielded a 3x improvement (versus non-optimized array pointer) for packing and a 2.5x improvement for unpacking. Overall, Optimized array/pointer packing was <5% faster than Optimized array/pointer unpacking.
Optimized array/pointer packing was ~10% faster than an Optimized Vector pack
Conclusion so far:
Vector.Divide() appears to be a comparatively slower implementation vs a normal arithmetic division
Furthermore, the Compiler does not appear to optimize Vector.Divide() code to anywhere near the same extent as Vector.Multiply() (which supports comments below regarding the optimising of division)
Array/pointer processing is at present slightly faster than the Vector class for packing data and significantly faster for unpacking
System.Numerics needs Vector.ShiftLeft() & Vector.ShiftRight() methods
Question (updated);
- is my conclusion roughly on track? or are there other aspects to check/consider?
Further Information:
int numPages = 8192; // up to >15K
int testSize = 65536;
StopWatch swPack = new StopWatch();
StopWatch swUnpack = new StopWatch();
long byteCount = 0;
for (int p = 0; p < numpages; b++)
{
int[] data = GetRandomIntegers(testSize, 14600, 14800);
swPack.Start();
byte[] compressedBytes = pack(data);
swPack.Stop();
swUnpack.Start();
int[] unpackedInts = unpack(compressedBytes);
swUnpack.Stop();
byteCount += (data.Length*4);
}
Console.WriteLine("Packing Throughput (MB/sec): " + byteCount / 1000 / swPack.ElapsedMilliseconds);
Console.WriteLine("Unpacking Throughput (MB/sec): " + byteCount / 1000 / swUnpacking.ElapsedMilliseconds);
Vector.Divide
has no hardware acceleration for integer types. It is very slow.It wasn't until
.NET 7.0
that Vector added ShiftRightArithmetic, ShiftRightLogical methods.I developed the VectorTraits library. It allows lower versions of
.NET
programs (.NET Core 3.0
+,.NET 5.0
+) to use the hardware accelerated ShiftRightArithmetic, ShiftRightLogical methods. https://www.nuget.org/packages/VectorTraits