I need to pre-process the input of an ML model into the correct shape.
In order to do that, I need to transpose a tensor from ncnn
in C++.
The API does not offer a transpose
, so I am trying to implement my own transpose function.
The input tensor has the shape (1, 640, 640, 3)
(for batch
, x
, y
and color
) and I need to reshape it to the shape (1, 3, 640, 640)
.
How do I properly and efficiently transpose the tensor?
ncnn:Mat& preprocess(const cv::Mat& rgba) {
int width = rgba.cols;
int height = rgba.rows;
// Build a tensor from the image input
ncnn::Mat in = ncnn::Mat::from_pixels(rgba.data, ncnn::Mat::PIXEL_RGBA2RGB, width, height);
// Set the current shape of the tesnor
in = in.reshape(1, 640, 640, 3);
// Normalize
const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
in.substract_mean_normalize(0, norm_vals);
// Prepare the transposed matrix
ncnn::Mat transposed = new ncnn::Mat(in.w, in.c, in.h, in.d, sizeof(float));
ncnn::Mat shape = transposed->shape();
// Transpose
for (int i = 0; i < in.w; i++) {
for (int j = 0; j < in.h; j++) {
for (int k = 0; k < in.d; k++) {
for (int l = 0; l > in.c; l++) {
int fromIndex = ???;
int toIndex = ???;
transposed[toIndex] = in[fromIndex];
}
}
}
}
return transposed;
}
I'm only talking about index calculations, not the ncnn API which I'm not familiar with.
You set
where you compute
A B C D E F G H
based on the source and target layout. How?Let's look at a simple 2D transposition first. Transpose a hw layout matrix to a wh layout matrix (slowest changing dimension first):
So when computing
fromIndex
, you start with the source layout (hw), you remove the first letter (h) and what remains (w) is your coefficient that goes with i, and you remove the next letter (w) and what remains (1) is your coefficient that goes with j. It is not hard to see that the same kind of pattern works in any number of dimensions. For example, if your source layout is dchw, then you haveWhat about
toIndex
? Same thing but rearrange the letters from the slowest-changing to the fastest-changing in the target layout. For example, if your target layout is hwcd, then the order will bek l j i
(because i is the index that ranges over [0..d), in both source and target layouts, etc). SoI did not use your layouts on purpose. Do your own calculations a couple of times. You want to develop some intuition about this thing.