I have saved a pre-trained version of distilbert, distilbert-base-uncased-finetuned-sst-2-english, from huggingface models, and i am attempting to serve it via Tensorflow Serve and make predictions. All is being tested currently in Colab at the moment.
I am having issue getting the prediction into the correct format for the model via TensorFlow Serve. Tensorflow services are up and running fine serving the model, however my prediction code is not correct and i need some help understanding how to make a prediction via json over the API.
# tokenize and encode a simple positive instance
instances = tokenizer.tokenize('this is the best day of my life!')
instances = tokenizer.encode(instances)
data = json.dumps({"signature_name": "serving_default", "instances": instances, })
print(data)
{"signature_name": "serving_default", "instances": [101, 2023, 2003, 1996, 2190, 2154, 1997, 2026, 2166, 999, 102]}
# setup json_response object
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/my_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)
predictions
{'error': '{{function_node __inference__wrapped_model_52602}} {{function_node __inference__wrapped_model_52602}} Incompatible shapes: [11,768] vs. [1,5,768]\n\t [[{{node tf_distil_bert_for_sequence_classification_3/distilbert/embeddings/add}}]]\n\t [[StatefulPartitionedCall/StatefulPartitionedCall]]'}
Any direction here would be appreciated.
Was able to find the solution by setting signatures for input shape and attention mask, which is the following below. This is a simple implementation that uses a fixed input shape for a saved model and requires you to pad the inputs to the expected input shape of 384. I have seen implementations of calling custom signatures and model creation to match expected input shapes, however the below simple case worked for what I was looking to accomplish with serving a huggingface model via TF Serve. If anyone has any better examples or ways to extend this functionality better, please post for future use.
By calling get_concrete_function, we trace-compile the TensorFlow operations of the model for an input signature composed of two Tensors of shape [None, 384], the first one being the input ids and the second one the attention mask.
save the model with the signatures:
check to see that it contains the correct signature:
output should look like:
TEST MODEL:
FOR TF SERVE (in colab): (which was my original intent with this)
MAKE A POST REQUEST: