I am working on a de-weatherization android app adopting a pix2pix model (similar to UNET). the app mainly uses the phone camera (one plus 7) to capture images, de-weatherize and display the result in the main interface. The deep learning inference interface adopts Qualcomm's SNPE framework.
Currently, we encountered a problem that the output of the model to Bitmap is misaligned, as shown in the figure.
Here's the inference code:
final List<String> result = new LinkedList<>();
final FloatTensor tensor = mNeuralNetwork.createFloatTensor(
mNeuralNetwork.getInputTensorsShapes().get(mInputLayer));
Log.e("[MODEL]", "create tensor");
Bitmap smImage = Bitmap.createScaledBitmap(mImage, 1080, 720, true);
final int[] dimensions = tensor.getShape();
final boolean isGrayScale = (dimensions[dimensions.length -1] == 1);
float[] rgbBitmapAsFloat;
if (!isGrayScale) {
rgbBitmapAsFloat = loadRgbBitmapAsFloat(smImage);
} else {
rgbBitmapAsFloat = loadGrayScaleBitmapAsFloat(smImage);
}
tensor.write(rgbBitmapAsFloat, 0, rgbBitmapAsFloat.length);
Log.e("[MODEL]", "create tensor done!");
final Map<String, FloatTensor> inputs = new HashMap<>();
inputs.put(mInputLayer, tensor);
Log.e("[MODEL]", "create input tensor done!");
final long javaExecuteStart = SystemClock.elapsedRealtime();
final Map<String, FloatTensor> outputs = mNeuralNetwork.execute(inputs);
Log.e("[MODEL]", "model execute!");
final long javaExecuteEnd = SystemClock.elapsedRealtime();
mJavaExecuteTime = javaExecuteEnd - javaExecuteStart;
FloatTensor outputTensor = new FloatTensor() {
@Override
public void write(float[] floats, int i, int i1, int... ints) {
}
@Override
public void write(float v, int... ints) {
}
@Override
public int read(float[] floats, int i, int i1, int... ints) {
return 0;
}
@Override
public float read(int... ints) {
return 0;
}
@Override
public void release() {
}
};
for (Map.Entry<String, FloatTensor> output : outputs.entrySet()) {
Log.e("[MODEL]", "output_layer: " + output.getKey());
if (output.getKey().equals(mOutputLayer)) {
outputTensor = output.getValue();
Log.e("[MODEL]", "output_layer: " + output.getKey() + ", shape: " +
String.valueOf(outputTensor.getShape()[0]) + " " +
String.valueOf(outputTensor.getShape()[1]) + " " +
String.valueOf(outputTensor.getShape()[2]) + " " +
String.valueOf(outputTensor.getShape()[3]) + " " );
}
}
return outputTensor;`
And, here's the code that convert SNPE Floattensor to JAVA Bitmap:
final float[] pixelsBatched = new float[tensor.getSize()];
tensor.read(pixelsBatched, 0, tensor.getSize());
Log.i("[IMAGE]", "size: " + String.valueOf(tensor.getSize()));
int w = 1080;
int h = 720;
Bitmap img = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888);
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
float r = pixelsBatched[y * w * 3 + x * 3 + 0] * 255;
float g = pixelsBatched[y * w * 3 + x * 3 + 1] * 255;
float b = pixelsBatched[y * w * 3 + x * 3 + 2] * 255;
int color = ((int)r << 16) | ((int)g << 8) | (int)b | 0xFF000000;
img.setPixel(x, y, color);
}
}
return img;
To further analyze this issue, I took the input tensor and output it directly instead of inferring it.
return tensor;
After converting the input tensor to Bitmap, I found that the image is correct. Therefore, I'm guessing if the inference step is wrong.
I used the Pytorch framework for training and the trained model was exported to the ONNX. I tested the model in pytorch framework and the model outputs the correct image. Then, model was then simplified by onnx-sim and converted to a dlc model by SNPE's conversion tool. The struct of onnx network is shown below.
I would like to ask what are the possible reasons for the occurrence of this misalignment. Thank you very much!
#################### Update!##################
int channelSize = w * h;
float r = pixelsBatched[y * w + x] * 255;
float g = pixelsBatched[y * w + x + channelSize] * 255;
float b = pixelsBatched[y * w + x + 2 * channelSize] * 255;
############################ Update! ##############################
The result of snpe-dlv-viewer:
The information of input layer:
The information of output layer:
############################ Update! ##########################
float[] loadRgbBitmapAsFloat(Bitmap image) {
final int[] pixels = new int[image.getWidth() * image.getHeight()];
image.getPixels(pixels, 0, image.getWidth(), 0, 0,
image.getWidth(), image.getHeight());
final float[] pixelsBatched = new float[pixels.length * 3];
for (int y = 0; y < image.getHeight(); y++) {
for (int x = 0; x < image.getWidth(); x++) {
final int idx = y * image.getWidth() + x;
final int batchIdx = idx * 3;
final float[] rgb = extractColorChannels(pixels[idx]);
pixelsBatched[batchIdx] = rgb[0];
pixelsBatched[batchIdx + 1] = rgb[1];
pixelsBatched[batchIdx + 2] = rgb[2];
}
}
return pixelsBatched;
}
I think you may have gotten the layout of the output tensor wrong. When you iterate over the output tensor like this:
You read the
R
,G
, andB
values sequentially. However, the output tensor layout is1x3x1080x720
, meaning allR
values are stored sequentially, then allB
values, then allG
values.So, you need to define
then, you read them like this: