How to use openBLAS to improve vectorized operations?

153 Views Asked by At

I am self-learning how to write efficient, optimized deep learning code; but I am very much a newbie at this.

For example: I am reading that numpy uses vectorization to avoid python loops.

They have also pretty much coined the term broadcasting according to that link, which is used by TensorFlow, PyTorch and others.

I did some digging, and found that ldd on my Debian box shows multiarray.so links libopenblasp-r0-39a31c03.2.18.so.

So let's take the use case of a matrix subtraction. I would like to understand how to use openBLAS to improve this very naive implementation:

void matrix_sub(Matrix *a, Matrix *b, Matrix *res)
{
  assert(a->cols == b->cols);
  assert(a->rows == b->rows);

  zero_out_data(res, a->rows, a->cols);

  for (int i = 0; i < (a->rows*a->cols); i++)
    {
      res->data[i] = a->data[i] - b->data[i];
    }
}

Like wise an inner product, or an addition?

0

There are 0 best solutions below