How to construct input/output for nvidia triton python client to invoke multi model endpoint?

355 Views Asked by At

setting up a python backend to test out multi model endpoints in aws sagemaker, came up with minimal client code to invoke/process the request/response for the inference with multi model endpoint. the example uses tritonclient.http , see below

    inputs.append(httpclient.InferInput('input_ids', [token_ids.shape[0], 512], "INT32"))

can someone point me to documentation on this library/function. i would like to understand what InferInput does (line above) . and also , sorry for the basic question on numpy, once the prediction is generated ,we call precition.as_numpy, i am assuming this converts into some array that numpy library represents.

import numpy as np
import tritonclient.http as httpclient
from utils import preprocess, postprocess

triton_client = httpclient.InferenceServerClient(url='localhost:8000')

def text_process(text):
    return tokenizer.encode(text, maxlen=512)

if __name__ == "__main__":

    text_list = ['some text here']
    token_ids= text_process(text_list)

    inputs = []
    inputs.append(httpclient.InferInput('input_ids', [token_ids.shape[0], 512], "INT32"))

    inputs[0].set_data_from_numpy(token_ids.astype(np.int32), binary_data=False)

    outputs = []
    outputs.append(httpclient.InferRequestedOutput('output', binary_data=False))

    prediction = triton_client.infer('sentence_classification', inputs=inputs, outputs=outputs)

    final_output = prediction.as_numpy("output")
0

There are 0 best solutions below