I'm trying to save a Keras model which uses a SentencepieceTokenizer
.
Everything is working so far but I am unable to save the Keras model.
After training the sentencepiece
model, I am creating the Keras model, call it with some examples first and then try to save it like so:
proto = tf.io.gfile.GFile(model_path, "rb").read()
model = Model(tokenizer=proto)
embed = model(examples)
assert embed.shape[0] == len(examples)
model.save("embed_model")
The model itself is straight foward and looks like this:
class Model(keras.Model):
def __init__(self, tokenizer: spm.SentencePieceProcessor, embed_size: int = 32, *args, **kwargs):
super().__init__(*args, **kwargs)
self.tokenizer = tf_text.SentencepieceTokenizer(model=tokenizer, nbest_size=1)
self.embeddings = layers.Embedding(input_dim=self.tokenizer.vocab_size(), output_dim=embed_size)
def call(self, inputs, training=None, mask=None):
x = self.tokenizer.tokenize(inputs)
if isinstance(x, tf.RaggedTensor):
x = x.to_tensor()
x = self.embeddings(x)
return x
The error I am getting is:
TypeError: Failed to convert elements of [None, None] to Tensor.
Consider casting elements to a supported type.
See https://www.tensorflow.org/api_docs/python/tf/dtypes for supported TF dtypes.
It appears to me as if the model literally gets called with model([None, None])
after calling model.save()
.
To be precise, the error appears to occur in ragged_tensor.convert_to_tensor_or_ragged_tensor(input)
:
E TypeError: Exception encountered when calling layer "model" (type Model).
E
E in user code:
E
E File "/home/sfalk/workspaces/technical-depth/ris-ml/tests/ris/ml/text/test_tokenizer.py", line 20, in call *
E x = self.tokenizer.tokenize(inputs)
E File "/home/sfalk/miniconda3/envs/ris-ml/lib/python3.10/site-packages/tensorflow_text/python/ops/sentencepiece_tokenizer.py", line 133, in tokenize *
E input_tensor = ragged_tensor.convert_to_tensor_or_ragged_tensor(input)
E
E TypeError: Failed to convert elements of [None, None] to Tensor. Consider casting elements to a supported type. See https://www.tensorflow.org/api_docs/python/tf/dtypes for supported TF dtypes.
E
E
E Call arguments received by layer "model" (type Model):
E • inputs=['None', 'None']
E • training=False
E • mask=None
/tmp/__autograph_generated_file99ftv9jw.py:22: TypeError
Maybe try defining an
input_signature
for thecall
method. Also callself.tokenizer.vocab_size().numpy()
instead ofself.tokenizer.vocab_size()
, since eager tensors are not serializable:Note that I removed the
training
parameter from thecall
method, since it already exits. Also, if you can setself.built=True
in the constructor so you will not have to call your model on actual data but it is up to you:Oh, and you might have to change the
input_signature
depending on which tokenizer model you are using.