memory distribution on matrix multiplication

58 Views Asked by At

I'm now working on a high-performance matrix multiplication library. I have implemented a function matmul(float* A, float* B, float* C, int M, int N, int K) which can perform matrix multiplication where A is [M, K], B is [K, N] and result C is [M, N].

Actually, in practice, float* A is got from Tensor.data() where Tensor is a class, .data() is its member function to get location data is actually located. Also I have implemented a transpose() function. A simple implementation of Tensor is follows:

class Tensor {
    void* data;
    std::vector<uint32_t> stride;
    std::vector<uint32_t> shape;

    void transpose(uint32_t dim0, uint32_t dim1) {
        std::swap(stride[dim0], stride[dim1]);
        std::swap(shape[dim0], shape[dim1]);
    }
}

So, my question is, if I have Tensor A[M, K], Tensor B[N, K], I want to do matmul(A.data(), B.transpose().data()), but transpose() doesn't change the B.data(), so this function cannot work. I want to know how to change it to compute the matrix multiplication between A and B.transpose().

I don't know how to deal with it.

0

There are 0 best solutions below