how to work with text input directly in triton server?

664 Views Asked by At

examples here (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/nlp_bert/triton_nlp_bert.ipynb) show , that instead of sending text and tokenizing text in the server, it is done in the client side and tokenized input is send to triton server. is it possible to pass in text directly for the nlp Bert model like in the example and tokenize the test in the server? the default model.py only accepts a list of pb_utils.InferenceRequest as input. I assume , this will need to overridden or customized to accept the types.

model.py

class TritonPythonModel:
    """Your Python model must use the same class name. Every Python model
    that is created must have "TritonPythonModel" as the class name.
    """

    def initialize(self, args):
        

    def execute(self, requests):
        """`execute` must be implemented in every Python model. `execute`
        function receives a list of pb_utils.InferenceRequest as the only
        argument. This function is called when an inference is requested
        for this model. Depending on the batching configuration (e.g. Dynamic
        Batching) used, `requests` may contain multiple requests. Every
        Python model, must create one pb_utils.InferenceResponse for every
        pb_utils.InferenceRequest in `requests`. If there is an error, you can
        set the error argument when creating a pb_utils.InferenceResponse.
        Parameters
        ----------
        requests : list
          A list of pb_utils.InferenceRequest
        Returns
        -------
        list
          A list of pb_utils.InferenceResponse. The length of this list must
          be the same as `requests`
        """
text_triton = "Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs."
input_ids, attention_mask = tokenize_text(text_triton)

payload = {
    "inputs": [
        {"name": "INPUT__0", "shape": [1, 128], "datatype": "INT32", "data": input_ids},
        {"name": "INPUT__1", "shape": [1, 128], "datatype": "INT32", "data": attention_mask},
    ]
}

response = client.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="application/octet-stream", Body=json.dumps(payload)
)

print(json.loads(response["Body"].read().decode("utf8")))

created and tested the multi model endpoint model in aws sagemaker with Nvidia triton server, with tokenized text as inputs.

0

There are 0 best solutions below