setting up a python backend to test out multi model endpoints in aws sagemaker, came up with minimal client code to invoke/process the request/response for the inference with multi model endpoint. the example uses tritonclient.http , see below
inputs.append(httpclient.InferInput('input_ids', [token_ids.shape[0], 512], "INT32"))
can someone point me to documentation on this library/function. i would like to understand what InferInput does (line above) . and also , sorry for the basic question on numpy, once the prediction is generated ,we call precition.as_numpy, i am assuming this converts into some array that numpy library represents.
import numpy as np
import tritonclient.http as httpclient
from utils import preprocess, postprocess
triton_client = httpclient.InferenceServerClient(url='localhost:8000')
def text_process(text):
return tokenizer.encode(text, maxlen=512)
if __name__ == "__main__":
text_list = ['some text here']
token_ids= text_process(text_list)
inputs = []
inputs.append(httpclient.InferInput('input_ids', [token_ids.shape[0], 512], "INT32"))
inputs[0].set_data_from_numpy(token_ids.astype(np.int32), binary_data=False)
outputs = []
outputs.append(httpclient.InferRequestedOutput('output', binary_data=False))
prediction = triton_client.infer('sentence_classification', inputs=inputs, outputs=outputs)
final_output = prediction.as_numpy("output")