I am working on a de-weatherization android app adopting a pix2pix model (similar to UNET). the app mainly uses the phone camera (one plus 7) to capture images, de-weatherize and display the result in the main interface. The deep learning inference interface adopts Qualcomm's SNPE framework.
Currently, we encountered a problem that the output of the model to Bitmap is misaligned, as shown in the figure.
Here's the inference code:
final List<String> result = new LinkedList<>();
final FloatTensor tensor = mNeuralNetwork.createFloatTensor(
mNeuralNetwork.getInputTensorsShapes().get(mInputLayer));
Log.e("[MODEL]", "create tensor");
Bitmap smImage = Bitmap.createScaledBitmap(mImage, 1080, 720, true);
final int[] dimensions = tensor.getShape();
final boolean isGrayScale = (dimensions[dimensions.length -1] == 1);
float[] rgbBitmapAsFloat;
if (!isGrayScale) {
rgbBitmapAsFloat = loadRgbBitmapAsFloat(smImage);
} else {
rgbBitmapAsFloat = loadGrayScaleBitmapAsFloat(smImage);
}
tensor.write(rgbBitmapAsFloat, 0, rgbBitmapAsFloat.length);
Log.e("[MODEL]", "create tensor done!");
final Map<String, FloatTensor> inputs = new HashMap<>();
inputs.put(mInputLayer, tensor);
Log.e("[MODEL]", "create input tensor done!");
final long javaExecuteStart = SystemClock.elapsedRealtime();
final Map<String, FloatTensor> outputs = mNeuralNetwork.execute(inputs);
Log.e("[MODEL]", "model execute!");
final long javaExecuteEnd = SystemClock.elapsedRealtime();
mJavaExecuteTime = javaExecuteEnd - javaExecuteStart;
FloatTensor outputTensor = new FloatTensor() {
@Override
public void write(float[] floats, int i, int i1, int... ints) {
}
@Override
public void write(float v, int... ints) {
}
@Override
public int read(float[] floats, int i, int i1, int... ints) {
return 0;
}
@Override
public float read(int... ints) {
return 0;
}
@Override
public void release() {
}
};
for (Map.Entry<String, FloatTensor> output : outputs.entrySet()) {
Log.e("[MODEL]", "output_layer: " + output.getKey());
if (output.getKey().equals(mOutputLayer)) {
outputTensor = output.getValue();
Log.e("[MODEL]", "output_layer: " + output.getKey() + ", shape: " +
String.valueOf(outputTensor.getShape()[0]) + " " +
String.valueOf(outputTensor.getShape()[1]) + " " +
String.valueOf(outputTensor.getShape()[2]) + " " +
String.valueOf(outputTensor.getShape()[3]) + " " );
}
}
return outputTensor;`
And, here's the code that convert SNPE Floattensor to JAVA Bitmap:
final float[] pixelsBatched = new float[tensor.getSize()];
tensor.read(pixelsBatched, 0, tensor.getSize());
Log.i("[IMAGE]", "size: " + String.valueOf(tensor.getSize()));
int w = 1080;
int h = 720;
Bitmap img = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888);
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
float r = pixelsBatched[y * w * 3 + x * 3 + 0] * 255;
float g = pixelsBatched[y * w * 3 + x * 3 + 1] * 255;
float b = pixelsBatched[y * w * 3 + x * 3 + 2] * 255;
int color = ((int)r << 16) | ((int)g << 8) | (int)b | 0xFF000000;
img.setPixel(x, y, color);
}
}
return img;
To further analyze this issue, I took the input tensor and output it directly instead of inferring it.
return tensor;
After converting the input tensor to Bitmap, I found that the image is correct. Therefore, I'm guessing if the inference step is wrong.
I used the Pytorch framework for training and the trained model was exported to the ONNX. I tested the model in pytorch framework and the model outputs the correct image. Then, model was then simplified by onnx-sim and converted to a dlc model by SNPE's conversion tool. The struct of onnx network is shown below.
I would like to ask what are the possible reasons for the occurrence of this misalignment. Thank you very much!
#################### Update!##################
int channelSize = w * h;
float r = pixelsBatched[y * w + x] * 255;
float g = pixelsBatched[y * w + x + channelSize] * 255;
float b = pixelsBatched[y * w + x + 2 * channelSize] * 255;
############################ Update! ##############################
The result of snpe-dlv-viewer:

The information of input layer:

The information of output layer:

############################ Update! ##########################
float[] loadRgbBitmapAsFloat(Bitmap image) {
final int[] pixels = new int[image.getWidth() * image.getHeight()];
image.getPixels(pixels, 0, image.getWidth(), 0, 0,
image.getWidth(), image.getHeight());
final float[] pixelsBatched = new float[pixels.length * 3];
for (int y = 0; y < image.getHeight(); y++) {
for (int x = 0; x < image.getWidth(); x++) {
final int idx = y * image.getWidth() + x;
final int batchIdx = idx * 3;
final float[] rgb = extractColorChannels(pixels[idx]);
pixelsBatched[batchIdx] = rgb[0];
pixelsBatched[batchIdx + 1] = rgb[1];
pixelsBatched[batchIdx + 2] = rgb[2];
}
}
return pixelsBatched;
}

I think you may have gotten the layout of the output tensor wrong. When you iterate over the output tensor like this:
You read the
R,G, andBvalues sequentially. However, the output tensor layout is1x3x1080x720, meaning allRvalues are stored sequentially, then allBvalues, then allGvalues.So, you need to define
then, you read them like this: