How to transpose a 4D tensor in C++?

572 Views Asked by At

I need to pre-process the input of an ML model into the correct shape. In order to do that, I need to transpose a tensor from ncnn in C++. The API does not offer a transpose, so I am trying to implement my own transpose function.

The input tensor has the shape (1, 640, 640, 3) (for batch, x, y and color) and I need to reshape it to the shape (1, 3, 640, 640).

How do I properly and efficiently transpose the tensor?

ncnn:Mat& preprocess(const cv::Mat& rgba) {
    int width = rgba.cols;
    int height = rgba.rows;

    // Build a tensor from the image input
    ncnn::Mat in = ncnn::Mat::from_pixels(rgba.data, ncnn::Mat::PIXEL_RGBA2RGB, width, height);

    // Set the current shape of the tesnor 
    in = in.reshape(1, 640, 640, 3);

    // Normalize
    const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
    in.substract_mean_normalize(0, norm_vals);

    // Prepare the transposed matrix
    ncnn::Mat transposed = new ncnn::Mat(in.w, in.c, in.h, in.d, sizeof(float));
    ncnn::Mat shape = transposed->shape();

    // Transpose
    
    for (int i = 0; i < in.w; i++) {
        for (int j = 0; j < in.h; j++) {
            for (int k = 0; k < in.d; k++) {
                for (int l = 0; l > in.c; l++) {
                    int fromIndex = ???;
                    int toIndex = ???;
                    transposed[toIndex] = in[fromIndex];
                }
            }
        }
    }

    return transposed; 
}
1

There are 1 best solutions below

2
On BEST ANSWER

I'm only talking about index calculations, not the ncnn API which I'm not familiar with.

You set

fromIndex = i*A + j*B + k*C + l*D;
  toIndex = i*E + j*F + k*G + l*H; 

where you compute A B C D E F G H based on the source and target layout. How?

Let's look at a simple 2D transposition first. Transpose a hw layout matrix to a wh layout matrix (slowest changing dimension first):

  for (int i = 0; i < h; ++i) {
      for (int j = 0; j < w; ++j) {
          int fromIndex = i * w + j * 1;
          //              ^       ^
          //              |       |
          //             i<h     j<w        <---- hw layout

          int   toIndex = j * h + i * 1;
          //              ^       ^
          //              |       |
          //             j<w     i<h        <---- wh layout
      }      
  }      

So when computing fromIndex, you start with the source layout (hw), you remove the first letter (h) and what remains (w) is your coefficient that goes with i, and you remove the next letter (w) and what remains (1) is your coefficient that goes with j. It is not hard to see that the same kind of pattern works in any number of dimensions. For example, if your source layout is dchw, then you have

fromIndex = i * (c*h*w) + j * (h*w) + k * (w) + l * (1);
//          ^             ^           ^         ^
//          |             |           |         |
//         i<d           j<c         k<h       l<w   <---- dchw

What about toIndex? Same thing but rearrange the letters from the slowest-changing to the fastest-changing in the target layout. For example, if your target layout is hwcd, then the order will be k l j i (because i is the index that ranges over [0..d), in both source and target layouts, etc). So

  toIndex = k * (w*c*d) + l * (c*d) + j * (d) + i * (1);
  //        ^             ^           ^         ^
  //        |             |           |         |
  //       k<h           l<w         j<c       i<d   <---- hwcd

I did not use your layouts on purpose. Do your own calculations a couple of times. You want to develop some intuition about this thing.