examples here (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/nlp_bert/triton_nlp_bert.ipynb) show , that instead of sending text and tokenizing text in the server, it is done in the client side and tokenized input is send to triton server. is it possible to pass in text directly for the nlp Bert model like in the example and tokenize the test in the server? the default model.py only accepts a list of pb_utils.InferenceRequest as input. I assume , this will need to overridden or customized to accept the types.
model.py
class TritonPythonModel:
"""Your Python model must use the same class name. Every Python model
that is created must have "TritonPythonModel" as the class name.
"""
def initialize(self, args):
def execute(self, requests):
"""`execute` must be implemented in every Python model. `execute`
function receives a list of pb_utils.InferenceRequest as the only
argument. This function is called when an inference is requested
for this model. Depending on the batching configuration (e.g. Dynamic
Batching) used, `requests` may contain multiple requests. Every
Python model, must create one pb_utils.InferenceResponse for every
pb_utils.InferenceRequest in `requests`. If there is an error, you can
set the error argument when creating a pb_utils.InferenceResponse.
Parameters
----------
requests : list
A list of pb_utils.InferenceRequest
Returns
-------
list
A list of pb_utils.InferenceResponse. The length of this list must
be the same as `requests`
"""
text_triton = "Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs."
input_ids, attention_mask = tokenize_text(text_triton)
payload = {
"inputs": [
{"name": "INPUT__0", "shape": [1, 128], "datatype": "INT32", "data": input_ids},
{"name": "INPUT__1", "shape": [1, 128], "datatype": "INT32", "data": attention_mask},
]
}
response = client.invoke_endpoint(
EndpointName=endpoint_name, ContentType="application/octet-stream", Body=json.dumps(payload)
)
print(json.loads(response["Body"].read().decode("utf8")))
created and tested the multi model endpoint model in aws sagemaker with Nvidia triton server, with tokenized text as inputs.