I'm now working on a high-performance matrix multiplication library.
I have implemented a function matmul(float* A, float* B, float* C, int M, int N, int K)
which can perform matrix multiplication where A is [M, K], B is [K, N] and result C is [M, N].
Actually, in practice, float* A
is got from Tensor.data()
where Tensor is a class, .data()
is its member function to get location data is actually located. Also I have implemented a transpose()
function. A simple implementation of Tensor
is follows:
class Tensor {
void* data;
std::vector<uint32_t> stride;
std::vector<uint32_t> shape;
void transpose(uint32_t dim0, uint32_t dim1) {
std::swap(stride[dim0], stride[dim1]);
std::swap(shape[dim0], shape[dim1]);
}
}
So, my question is, if I have Tensor A[M, K]
, Tensor B[N, K]
, I want to do matmul(A.data(), B.transpose().data())
, but transpose()
doesn't change the B.data()
, so this function cannot work. I want to know how to change it to compute the matrix multiplication between A
and B.transpose()
.
I don't know how to deal with it.