AWS Sagemaker MultiModel endpoint additional dependencies

30 Views Asked by At

I am trying to deploy a multi model endpoint on aws sagemaker. However some of my models have additional dependencies. I am following the huggingface's documentation on creating user-defined-code and requirements.

My zipped artifacts have a code directory with requirements.txt and yet when I deploy the model and try invoking it with the python aws sdk. I get ModuleNotFound errors during my imports.

I know its finding my inference.py file since its failing to find those modules that I import.

It should be noted that these models I am deploying are trained and made outside of sagemaker and I am trying to bring them into sagemaker.

The container image I am using is '763104351884.dkr.ecr.ca-central-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04'

1

There are 1 best solutions below

0
fucalost On

Hey Lucas

It seems to me that you're confusing/mixing two different methods to deploy models in SageMaker.

If you want to create a Multi-Model Endpoint, unfortunately, you'll need to create a Docker Image that adheres to SageMaker's requirements (e.g. which ports to expose, etc.). You can read more on this here.

The HuggingFace guide you've been following is designed for single-model endpoints, and does indeed enable you to use custom dependencies. You may wish to simply create single-model endpoints for all of your models, by following the below steps:

  • Clone the model from Hugging Face using git
  • Create a code/ directory (within the model dir) and add an inference.py file
  • Include two functions in the inference file, these must be called model_fn() and predict_fn(). The former is used only when the endpoint is initialised, and must return the model & tokeniser, the latter is called for each inference request. You can use the predict_fn() to include custom logic.
  • Create a tarball (model.tar.gz) with all the model artefacts (incl. your custom inference code). It should be formatted as below.
model.tar.gz/
|- pytorch_model.bin
|- ....
|- code/
  |- inference.py
  |- requirements.txt 
  • Finally, upload the tarball to S3 and pass the S3 URI to SageMaker when creating a model/endpoint.

There's a great notebook from Hugging Face covering this whole process. It's the best guide I've been able to find so far. If you copy it word-for-word, and only modify the inference.py script, you should be successful.

Here's an example of an inference.py I've used previously, as you can see, Hugging Face Pipelines work too!

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
from DirectQuoteUtils import reformat
import torch
import os

def model_fn(model_dir):
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForTokenClassification.from_pretrained(model_dir)
    pipe = pipeline("ner", model=model, tokenizer=tokenizer)
    return pipe

def predict_fn(data, pipeline):
    pipe = pipeline
    outputs = []
    
    # FORMAT FOR MODEL INPUT:
    # {               # list of strings
    #     "inputs": ["Donald Trump is the president of the US", "Joe Biden is the United States president"]
    # }
    
    modelData = pipe(data['inputs'])
    
    for prediction in modelData:
        cleanPred = reformat(prediction)
        outputs.append(cleanPred)
        
    return {
        # "device": device, # handy to check if CUDA is being used
        "outputs": outputs
    }