Matrix multiplication in C# - is there a better way than copying to a jagged array?

56 Views Asked by At

Beginner here. Sorry.

Old method:

float[] Matrix = new float[dimension * dimension];

// Matrix is square
// 1D array is much faster than 2D/jagged array for this step
// Matrix will be multiplied by itself

float[] Result = new float[dimension * dimension];

_ = Parallel.For(0, dimension, i =>
{
    float[] column = new float[dimension];
    for (int j = 0; j < dimension; j++)
    {
        column[j] = Matrix[j * dimension + i];
    }
    for (int j = 0; j < dimension; j++)
    {
        float DotProd = 0;
        for (int k = 0; k < dimension; k++)
        {
            DotProd += Matrix[j * dimension + k] * column[k];
        }
        Result[j * dimension + i] = DotProd;
    }
}

It makes sense to me why making the "column" array beforehand improves performance: this array puts all of the necessary data in one location, which is then stored in the CPU cache. The performance improvement when I did this was enormous. Calculation time went from 67 seconds to 30 seconds.

So, seeing that it worked so well for "columns", I tried it with the "rows" as well. This was done by copying the 1D array into a Jagged array, like so:

float[] Matrix = new float[dimension * dimension];

// Matrix is square
// 1D array is much faster than 2D/jagged array when filling it with data
// Matrix will be multiplied by itself

float[] Result = new float[dimension * dimension];

float[][] JagMatrix = new float[dimension][];
for (int i = 0; i < dimension; i++)
{
    JagMatrix[i] = new float[dimension];
    Array.Copy(Matrix, dimension * i, JagMatrix[i], 0, dimension);
}

_ = Parallel.For(0, dimension, i =>
{
    float[] column = new float[dimension];
    for (int j = 0; j < dimension; j++)
    {
        column[j] = Matrix[j * dimension + i];
    }
    for (int j = 0; j < dimension; j++)
    {
        float DotProd = 0;
        float[] row = JagMatrix[j];
        for (int k = 0; k < dimension; k++)
        {
            DotProd += row[k] * column[k];
        }
        Result[j * dimension + i] = DotProd;
    }
}

This is about 30% faster again, presumably for the same reason: it ensures that the "row" is stored in CPU cache.

However, I'm wondering if there's a way to get the "row" into cache without copying the matrix to a jagged array. Besides the (probably negligible) performance hit from setting it up, it doesn't seem like it should be necessary. Unlike a "column", the data for a "row" should already be in the one location (as indicated by the much smaller performance improvement). So, is there a way to get it into cache without creating the jagged array?

It can't be a jagged array from the start as it's faster to fill the matrix when it's a 1D array.

I've tried using pointers and spans, but both were slower.

0

There are 0 best solutions below