I have a device matrix U of dimensions MxN in column major ordering. Now I'd like to extract the row K into a vector u. Is there a function to accomplish this? Note the copy would need to take into account an offset of K and a stride of M.
I was looking at the function cudaMemcpy2D but it rings no bells, coming from a more LAPACK style API I don't understand what these pitch parameters are, why are they not called simply rows and cols or M and N?
You can use
as