I started with DeepLabV3+ mlmodel that outputs 2D Multiarray (Segmented). Successfully added a layer that takes this as an input and outputs GRAYSCALE image.
Now, I would like to take this GrayScale image as input and output ARGB, in which I would like to make either one of the color transparent.
How to setup such a layer?
My python code for this:
import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft
coreml_model = coremltools.models.MLModel('DeepLabKP.mlmodel')
spec = coreml_model.get_spec()
spec_layers = getattr(spec,spec.WhichOneof("Type")).layers
# find the current output layer and save it for later reference
last_layer = spec_layers[-1]
# add the post-processing layer
new_layer = spec_layers.add()
new_layer.name = 'image_gray_to_RGB'
# Configure it as an activation layer
new_layer.activation.linear.alpha = 255
new_layer.activation.linear.beta = 0
# Use the original model's output as input to this layer
new_layer.input.append(last_layer.output[0])
# Name the output for later reference when saving the model
new_layer.output.append('image_gray_to_RGB')
# Find the original model's output description
output_description = next(x for x in spec.description.output if x.name==last_layer.output[0])
# Update it to use the new layer as output
output_description.name = new_layer.name
# Function to mark the layer as output
# https://forums.developer.apple.com/thread/81571#241998
def convert_grayscale_image_to_RGB(spec, feature_name, is_bgr=False):
"""
Convert an output multiarray to be represented as an image
This will modify the Model_pb spec passed in.
Example:
model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
spec = model.get_spec()
convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
newModel = coremltools.models.MLModel(spec)
newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
Parameters
----------
spec: Model_pb
The specification containing the output feature to convert
feature_name: str
The name of the multiarray output feature you want to convert
is_bgr: boolean
If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
"""
for output in spec.description.output:
if output.name != feature_name:
continue
if output.type.WhichOneof('Type') != 'imageType':
raise ValueError("%s is not a image type" % output.name)
output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
# Mark the new layer as image
convert_grayscale_image_to_RGB(spec, output_description.name, is_bgr=False)
updated_model = coremltools.models.MLModel(spec)
updated_model.author = 'Saran'
updated_model.license = 'MIT'
updated_model.short_description = 'Inherits DeepLab V3+ and adds a layer to turn scores into an image'
updated_model.input_description['image'] = 'Input Image'
updated_model.output_description[output_description.name] = 'RGB Image'
model_file_name = 'DeepLabKP-G2R.mlmodel'
updated_model.save(model_file_name)
While model successfully saves without any error, prediction errors as below
result = model.predict({'image': img})
File "/Users/saran/Library/Python/2.7/lib/python/site-packages/coremltools/models/model.py", line 336, in predict
return self.__proxy__.predict(data, useCPUOnly)
RuntimeError: {
NSLocalizedDescription = "Failed to convert output image_gray_to_RGB to image";
NSUnderlyingError = "Error Domain=com.apple.CoreML Code=0 \"Invalid array shape (\n 1,\n 513,\n 513\n) for converting to gray image\" UserInfo={NSLocalizedDescription=Invalid array shape (\n 1,\n 513,\n 513\n) for converting to gray image}";
}
I feel like it has to do with how activation is set in this layer. But couldn't find anything to try that differently.
Any help is very much appreciated.
They grayScale image that the layer I added producing
It looks like your output has shape (1, 513, 513). The first number, 1, is the number of channels. Since this is 1, Core ML can only turn the output into a grayscale image. A color image needs 3 channels, or a shape of (3, 513, 513).
Since this is DeepLab, I'm assuming your grayscale image doesn't really have "colors" in it but the index of the class (in other words, you've taken the ARGMAX over the predictions). The easiest way, in my opinion, to turn this grayscale "image" (segmentation mask, really) into a color image is to do this in Swift or in Metal.
Here is a source code example: https://github.com/hollance/SemanticSegmentationMetalDemo