I am self-learning how to write efficient, optimized deep learning code; but I am very much a newbie at this.
For example: I am reading that numpy
uses vectorization to avoid python loops.
They have also pretty much coined the term broadcasting according to that link, which is used by TensorFlow, PyTorch and others.
I did some digging, and found that ldd
on my Debian box shows multiarray.so
links libopenblasp-r0-39a31c03.2.18.so
.
So let's take the use case of a matrix subtraction. I would like to understand how to use openBLAS to improve this very naive implementation:
void matrix_sub(Matrix *a, Matrix *b, Matrix *res)
{
assert(a->cols == b->cols);
assert(a->rows == b->rows);
zero_out_data(res, a->rows, a->cols);
for (int i = 0; i < (a->rows*a->cols); i++)
{
res->data[i] = a->data[i] - b->data[i];
}
}
Like wise an inner product, or an addition?