How to increase AWS Sagemaker invocation time out while waiting for a response

22k Views Asked by At

I deployed a large 3D model to aws sagemaker. Inference will take 2 minutes or more. I get the following error while calling the predictor from Python:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."'

In Cloud Watch I also see some PING time outs while the container is processing:

2020-10-07T16:02:39.718+02:00 2020/10/07 14:02:39 https://forums.aws.amazon.com/ 106#106: *251 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock/ping", host: "model.aws.local:8080"

How do I increase the invocation time out?

Or is there a way to make async invocations to an sagemaker endpoint?

2

There are 2 best solutions below

1
On BEST ANSWER

It’s currently not possible to increase timeout—this is an open issue in GitHub. Looking through the issue and similar questions on SO, it seems like you may be able to use batch transforms in conjunction with inference.

References

https://stackoverflow.com/a/55642675/806876

Sagemaker Python SDK timeout issue: https://github.com/aws/sagemaker-python-sdk/issues/1119

2
On

This timeout is actually specified at server side - endpoint to be specific. You can try the way of bring your own container also known as BYOC, this way you get full control of everything on endpoint side including the timeout.

You can also reference the endpoint part of this repo which is from one of my colleague - https://github.com/jackie930/yolov4-SageMaker

The timeout you should change exists in serve.py model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)