I am using Windows-Machine-Learning to convert my VideoFrame to a TensorFloat _input (shape: 1,3,256,192; RGB channels + image), load that into my onnx model and receive as _output another TensorFloat object (shape: 1,17,64,48; 17 detected objects + image).
Now my question: If I want to access that TensorFloat _output, currently the only way I know is to use _output.data.GetAsVectorView, which gives me a long 1d Vector and try to reorder that and figure out how the dimensions are ordered in there? Is there a clear rule that I can follow to understand how the 4D tensor is encoded in the 1D Vector? Alternatively, can I somehow access the different dimensions directly from the _output TensorFloat object, since using "Shape" shows me that it is a multidimensional array?
Please refer to the layout of Windows ML tensors here:
https://learn.microsoft.com/en-us/uwp/api/windows.ai.machinelearning.tensorfloat?view=winrt-20348
A tensor is a multi-dimensional array of values. A float tensor is a tensor of 32-bit floating point values.
The layout of tensors is row-major, with tightly packed contiguous data representing each dimension. The total size of a tensor is the product of the size of each dimension.
Consider:
and you wish to compute the index at Location.
Then, you can assume that:
So: