I would like to convert the following model to ONNX format:
import torch
import torchaudio
class ConformerSpeechRecognizer(torch.nn.Module):
def __init__(self,
kernel_size,
ffn_dim: int,
feature_vector_size: int,
hidden_layer_size: int,
num_layers: int,
num_heads: int,
dropout: float,
depthwise_conv_kernel_size: int,
vocabulary_size: int):
super().__init__()
self.vocabulary_size = vocabulary_size
self.cnn_ = torch.nn.Sequential(
torch.nn.BatchNorm1d(num_features=feature_vector_size),
torch.nn.Conv1d(
in_channels=feature_vector_size,
out_channels=hidden_layer_size,
bias=False,
kernel_size=(kernel_size,),
padding='same'
),
torch.nn.BatchNorm1d(num_features=hidden_layer_size)
)
self.conformer_ = torchaudio.models.Conformer(
input_dim=hidden_layer_size,
num_heads=num_heads,
num_layers=num_layers,
ffn_dim=ffn_dim,
depthwise_conv_kernel_size=depthwise_conv_kernel_size,
dropout=dropout
)
self.proba_appoximator_ = torch.nn.Linear(
in_features=hidden_layer_size,
out_features=vocabulary_size
)
def forward(self, inputs: torch.Tensor, input_lenghts: torch.Tensor) -> torch.Tensor:
hidden_states = torch.nn.functional.gelu(self.cnn_(inputs.permute(0, 2, 1)))
hidden_states, _ = self.conformer_(hidden_states.permute(0, 2, 1), input_lenghts)
output_logits = self.proba_appoximator_(hidden_states)
return output_logits
I have issues with converting and I'm not sure in their origin. I convert the model in the following way:
torch.onnx.export(model.cpu(),
f="../model.onnx",
input_names=["inputs", "input_lenghts"],
output_names=["logits"],
args=({ "inputs": torch.ones(1, 1, 13), "input_lenghts": torch.ones(1, dtype=torch.long) }),
dynamic_axes={
"inputs": { 0: "batch_count", 1: "batch_item_length" },
"input_lenghts": { 0: "batch_count" },
"logits": { 0: "batch_count", 1: "batch_item_length" }
},
verbose=True
)
If I convert the model WITHOUT using
scripted_model = torch.jit.script(model)
the converting itself looks good (except I get
"UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\Convolution.cpp:1041."
warning) However, when I start to test the model, it crashes with
RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/conformer_/conformer_layers.0/self_attn/Reshape_4' Status Message: D:\bld\onnxruntime_1710148767998\work\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:45 onnxruntime::ReshapeHelper::ReshapeHelper input_shape_size == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{222,3,512}, requested shape:{1,24,64}
that looks as I have incorrect dynamic axes settings. However, I don't understand what is wrong with them. What bothers me even more, is that if I change
torch.ones(1, 1, 13) to torch.ones(100, 1, 13), for example, it works (as expected), but if to torch.ones(1, 100, 13), conversion fails while my expectation is that due to dynamic axes first two dimensions should not be taken into account by converter at all. But in the case, the second one can only be 1.
Alternative issue that I see is that maybe I need to script the model first (however, I think that my model doesn't require it according to docs).
But I can't convert model from script at all, if I call scripted_model = torch.jit.script(model) and try to export scripted_model, I get
Conda\envs\torch\Lib\site-packages\torch\onnx\symbolic_opset9.py", line 7089, in prim_if
torch._C._jit_pass_onnx_node_shape_type_inference(
RuntimeError: ScalarType UNKNOWN_SCALAR is an unexpected tensor scalar type
What is wrong with my conversion configuration?