Missing Weight Vectors when converting from PyTorch to CoreML via ONNX

980 Views Asked by At

I am trying to convert a PyTorch model to CoreML via ONNX, but the ONNX-->CoreML conversion is missing weight vectors?

I am following the tutorial here which makes this statement:

Step 3: Converting the model to CoreML

It's as easy as running the convert function. The resulting object is a coremltools MLModel object that you can save to a file and import in XCode later.
cml = onnx_coreml.convert(model)

Unfortunately when I try to do this it fails horribly.

Here's my code:

# convert.py
import torch
import torch.onnx
from torch.autograd import Variable

import onnx
from onnx_coreml import convert

from hourglass_model import Hourglass

model_no = 1
torch_model = Hourglass(joint_count=14, size=256)
state_dict = torch.load("hourglass_model_{}.model".format(model_no))
torch_model.load_state_dict(state_dict)
torch_model.train(False)
torch_model.eval()

# Dummy Input to the model
x = Variable(torch.randn(1,3,256,256,dtype=torch.float32))

# Export the model
onnx_filename = "test_hourglass.onnx"
torch_out = torch.onnx.export(torch_model, x, onnx_filename, export_params=False) 

# Load back in ONNX model
onnx_model = onnx.load(onnx_filename)

# Check that the IR is well formed
onnx.checker.check_model(onnx_model)

# Print a human readable representation of the graph
graph = onnx.helper.printable_graph(onnx_model.graph)
print(graph)

coreml_model = convert(onnx_model,
    add_custom_layers=True,
    image_input_names=["input"], 
    image_output_names=["output"])
coreml_model.save('test_hourglass.mlmodel')

Here's what the print(graph) line gives.

graph torch-jit-export (
  %0[FLOAT, 1x3x256x256]
  %1[FLOAT, 64x3x5x5]
  %2[FLOAT, 64]
  %3[FLOAT, 64x64x5x5]
  %4[FLOAT, 64]
  %5[FLOAT, 64x64x5x5]
  %6[FLOAT, 64]
  %7[FLOAT, 64x64x5x5]
  %8[FLOAT, 64]
  %9[FLOAT, 64x64x5x5]
  %10[FLOAT, 64]
  %11[FLOAT, 64x64x5x5]
  %12[FLOAT, 64]
  %13[FLOAT, 64x64x5x5]
  %14[FLOAT, 64]
  %15[FLOAT, 64x64x1x1]
  %16[FLOAT, 64]
  %17[FLOAT, 14x64x1x1]
  %18[FLOAT, 14]
) {
  %19 = Conv[dilations = [1, 1], group = 1, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%0, %1, %2)
  %20 = Relu(%19)
  %21 = MaxPool[kernel_shape = [4, 4], pads = [0, 0, 0, 0], strides = [4, 4]](%20)
  %22 = Conv[dilations = [1, 1], group = 1, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%21, %3, %4)
  %23 = Relu(%22)
  %24 = MaxPool[kernel_shape = [4, 4], pads = [0, 0, 0, 0], strides = [4, 4]](%23)
  %25 = Conv[dilations = [1, 1], group = 1, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%24, %5, %6)
  %26 = Relu(%25)
  %27 = Conv[dilations = [1, 1], group = 1, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%26, %7, %8)
  %28 = Relu(%27)
  %29 = Conv[dilations = [1, 1], group = 1, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%28, %9, %10)
  %30 = Relu(%29)
  %31 = Upsample[height_scale = 4, mode = 'nearest', width_scale = 4](%30)
  %32 = Add(%31, %23)
  %33 = Conv[dilations = [1, 1], group = 1, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%32, %11, %12)
  %34 = Relu(%33)
  %35 = Upsample[height_scale = 4, mode = 'nearest', width_scale = 4](%34)
  %36 = Add(%35, %20)
  %37 = Conv[dilations = [1, 1], group = 1, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%36, %13, %14)
  %38 = Relu(%37)
  %39 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%38, %15, %16)
  %40 = Relu(%39)
  %41 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%40, %17, %18)
  %42 = Relu(%41)
  return %42
}

And this is the error message:

1/24: Converting Node Type Conv
Traceback (most recent call last):
  File "convert.py", line 38, in <module>
    image_output_names=["output"])
  File "/Users/stephenf/Developer/miniconda3/envs/pytorch/lib/python3.6/site-packages/onnx_coreml/converter.py", line 396, in convert
    _convert_node(builder, node, graph, err)
  File "/Users/stephenf/Developer/miniconda3/envs/pytorch/lib/python3.6/site-packages/onnx_coreml/_operators.py", line 994, in _convert_node
    return converter_fn(builder, node, graph, err)
  File "/Users/stephenf/Developer/miniconda3/envs/pytorch/lib/python3.6/site-packages/onnx_coreml/_operators.py", line 31, in _convert_conv
    "Weight tensor: {} not found in the graph initializer".format(weight_name,))
  File "/Users/stephenf/Developer/miniconda3/envs/pytorch/lib/python3.6/site-packages/onnx_coreml/_error_utils.py", line 71, in missing_initializer
    format(node.op_type, node.inputs[0], node.outputs[0], err_message)
ValueError: Missing initializer error in op of type Conv, with input name = 0, output name = 19. Error message: Weight tensor: 1 not found in the graph initializer

From what I can gather, it says the weight tensor %1[FLOAT, 64x3x5x5] is missing. This is how I'm saving the model:

torch.save(model.state_dict(), "hourglass_model_{}.model".format(epoch))

ONNX loads it fine - it's just the step where I'm converting from ONNX to CoreML.

Any help in figuring this out would be greatly appreciated. I'm sure I've done a bunch of other things wrong, but I just need this thing to export for now.

Thanks,

1

There are 1 best solutions below

0
On BEST ANSWER

You are calling torch.onnx.export with export_params=False, which, as the 0.3.1 doc reads, is saving the model architecture without the actual parameter tensors. The more recent documentation doesn't specify this, but we can assume that due to the Weight tensor not found error that you are getting.

Try it with export_params=True, you should see how the saved model's size increases notably.

Glad it helped!
Andres