I would like to convert the following model to ONNX format:

import torch
import torchaudio

class ConformerSpeechRecognizer(torch.nn.Module):
    def __init__(self,
                 kernel_size,
                 ffn_dim: int,
                 feature_vector_size: int,
                 hidden_layer_size: int,
                 num_layers: int,
                 num_heads: int,
                 dropout: float,
                 depthwise_conv_kernel_size: int,
                 vocabulary_size: int):
        super().__init__()
        self.vocabulary_size = vocabulary_size
        self.cnn_ = torch.nn.Sequential(
            torch.nn.BatchNorm1d(num_features=feature_vector_size),
            torch.nn.Conv1d(
                in_channels=feature_vector_size,
                out_channels=hidden_layer_size,
                bias=False,
                kernel_size=(kernel_size,),
                padding='same'
            ),
            torch.nn.BatchNorm1d(num_features=hidden_layer_size)
        )
        self.conformer_ = torchaudio.models.Conformer(
            input_dim=hidden_layer_size,
            num_heads=num_heads,
            num_layers=num_layers,
            ffn_dim=ffn_dim,
            depthwise_conv_kernel_size=depthwise_conv_kernel_size,
            dropout=dropout
        )
        self.proba_appoximator_ = torch.nn.Linear(
            in_features=hidden_layer_size,
            out_features=vocabulary_size
        )
    def forward(self, inputs: torch.Tensor, input_lenghts: torch.Tensor) -> torch.Tensor:
        hidden_states = torch.nn.functional.gelu(self.cnn_(inputs.permute(0, 2, 1)))
        hidden_states, _ = self.conformer_(hidden_states.permute(0, 2, 1), input_lenghts)
        output_logits = self.proba_appoximator_(hidden_states)
        return output_logits

I have issues with converting and I'm not sure in their origin. I convert the model in the following way:

torch.onnx.export(model.cpu(),
                  f="../model.onnx",
                  input_names=["inputs", "input_lenghts"],
                  output_names=["logits"],
                  args=({ "inputs": torch.ones(1, 1, 13), "input_lenghts": torch.ones(1, dtype=torch.long) }),
                  dynamic_axes={
                    "inputs": { 0: "batch_count", 1: "batch_item_length" },
                    "input_lenghts": { 0: "batch_count" },
                    "logits": { 0: "batch_count", 1: "batch_item_length" }
                  },
                  verbose=True
                )

If I convert the model WITHOUT using

scripted_model = torch.jit.script(model)

the converting itself looks good (except I get

"UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\Convolution.cpp:1041."

warning) However, when I start to test the model, it crashes with

RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/conformer_/conformer_layers.0/self_attn/Reshape_4' Status Message: D:\bld\onnxruntime_1710148767998\work\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:45 onnxruntime::ReshapeHelper::ReshapeHelper input_shape_size == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{222,3,512}, requested shape:{1,24,64}

that looks as I have incorrect dynamic axes settings. However, I don't understand what is wrong with them. What bothers me even more, is that if I change torch.ones(1, 1, 13) to torch.ones(100, 1, 13), for example, it works (as expected), but if to torch.ones(1, 100, 13), conversion fails while my expectation is that due to dynamic axes first two dimensions should not be taken into account by converter at all. But in the case, the second one can only be 1.

Alternative issue that I see is that maybe I need to script the model first (however, I think that my model doesn't require it according to docs). But I can't convert model from script at all, if I call scripted_model = torch.jit.script(model) and try to export scripted_model, I get

Conda\envs\torch\Lib\site-packages\torch\onnx\symbolic_opset9.py", line 7089, in prim_if
    torch._C._jit_pass_onnx_node_shape_type_inference(
RuntimeError: ScalarType UNKNOWN_SCALAR is an unexpected tensor scalar type

What is wrong with my conversion configuration?

0

There are 0 best solutions below