Converting triton container to work with sagemaker MME

Question

Converting triton container to work with sagemaker MME

255 Views Asked by toing_toing At 28 July 2025 at 00:42

I have a custom triton docker container that use a python backend. This container works perfectly on local.

Here is the container dockerfile (I have ommitted irrelevant parts).

ARG TRITON_RELEASE_VERSION=22.12
FROM nvcr.io/nvidia/tritonserver:${TRITON_RELEASE_VERSION}-pyt-python-py3

LABEL owner='toing'
LABEL maintainer='[email protected]'

LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

ARG TRITON_RELEASE_VERSION

ENV DEBIAN_FRONTEND=noninteractive
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8

ENV GIT_TRITON_RELEASE_VERSION="r$TRITON_RELEASE_VERSION"
ENV TRITON_MODEL_DIRECTORY="/opt/ml/model"

SHELL ["/bin/bash", "-c"]

# nvidia updated their repository keys recently
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    # generic requirements
    gcc \
    libgl1-mesa-glx

RUN pip install --upgrade pip && \
    pip install --no-cache-dir setuptools \
    scikit-build \
    opencv-python-headless \
    cryptography

# run create model dir
RUN mkdir -p $TRITON_MODEL_DIRECTORY

# for mmcv installation
ENV FORCE_CUDA="1"

# set TORCH_CUDA_ARCH_LIST
ENV TORCH_CUDA_ARCH_LIST="7.5"

RUN pip install --no-cache-dir what-i-need --index-url 

# install pytorch requirements from aws
RUN mkdir -p /app/snapshots && \
    mkdir -p /keys

# Copy the requirements files
ADD requirements/build.txt /install/build.txt

# install specific packages
RUN pip install --no-cache-dir -r /install/build.txt

# number of workers per model
ENV SAGEMAKER_MODEL_SERVER_WORKERS=1
ENV SAGEMAKER_BIND_TO_PORT=8000
ENV SAGEMAKER_SAFE_PORT_RANGE=8000-8002

# HTTP Inference Service
EXPOSE 8000

# GRPC Inference Service
EXPOSE 8001

# Metrics Service
EXPOSE 8002

RUN echo -e "#!/bin/bash\n\
tritonserver --model-repository ${TRITON_MODEL_DIRECTORY}"\
>> /start.sh

RUN chmod +x /start.sh

# Set the working directory to /
WORKDIR /

ENTRYPOINT ["/start.sh"]

The problem is that when I launch it from the sagemaker MME endpoint, the triton server starts and runs, but apprently sagemaker fails to detect the running server, hence the healthchecks fail and the endpoint creation fails.

Am Is using the wrong port, or what should I do to avoid this error?

PS: I did see that the base NGC container used in this dockerfile uses an entrypoint at /opt/nvidia/nvidia_entrypoint.sh but the code seems to be just a wrapper around the original entrypoint.

Original Q&A

There are 1 best solutions below

**toing_toing** · Accepted Answer

The problem was that sagemaker requires triton to run on port 8080:

ENV SAGEMAKER_MULTI_MODEL=true
ENV SAGEMAKER_BIND_TO_PORT=8080

EXPOSE 8080

and that triton needs to run in sagemaker mode --allow-sagemaker=true. The command needed to run this was found on this link.

RUN echo -e "#!/bin/bash\n\
tritonserver --allow-sagemaker=true --allow-grpc=false --allow-http=false --allow-metrics=false --model-control-mode=explicit  --model-repository ${TRITON_MODEL_DIRECTORY}"\
>> /start.sh

So I adapted this to my dockerfile and triton was able to startup with sagemaker.

PS: When using a custom python stub, there is an open issue where s3 removes execution permissions for the stub. To avoid this, I had to put the python stub directly in /opt/tritonserver/backends/python/triton_python_backend_stub instead of in the model tar.gz file as recommended in the documentation. However, this does not work when different models use different stubs.

Converting triton container to work with sagemaker MME

There are 1 best solutions below

Related Questions in DOCKER

Related Questions in NVIDIA

Related Questions in AMAZON-SAGEMAKER

Related Questions in TRITONSERVER

Trending Questions

Popular # Hahtags

Popular Questions