AWS Sagemaker MultiModel endpoint additional dependencies

Question

AWS Sagemaker MultiModel endpoint additional dependencies

30 Views Asked by Lucas At 26 March 2024 at 23:11

I am trying to deploy a multi model endpoint on aws sagemaker. However some of my models have additional dependencies. I am following the huggingface's documentation on creating user-defined-code and requirements.

My zipped artifacts have a code directory with requirements.txt and yet when I deploy the model and try invoking it with the python aws sdk. I get ModuleNotFound errors during my imports.

I know its finding my inference.py file since its failing to find those modules that I import.

It should be noted that these models I am deploying are trained and made outside of sagemaker and I am trying to bring them into sagemaker.

The container image I am using is '763104351884.dkr.ecr.ca-central-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04'

Original Q&A

There are 1 best solutions below

**fucalost** · Answer 1 · 2024-03-28T08:26:50.453000

Hey Lucas

It seems to me that you're confusing/mixing two different methods to deploy models in SageMaker.

If you want to create a Multi-Model Endpoint, unfortunately, you'll need to create a Docker Image that adheres to SageMaker's requirements (e.g. which ports to expose, etc.). You can read more on this here.

The HuggingFace guide you've been following is designed for single-model endpoints, and does indeed enable you to use custom dependencies. You may wish to simply create single-model endpoints for all of your models, by following the below steps:

Clone the model from Hugging Face using git
Create a code/ directory (within the model dir) and add an inference.py file
Include two functions in the inference file, these must be called model_fn() and predict_fn(). The former is used only when the endpoint is initialised, and must return the model & tokeniser, the latter is called for each inference request. You can use the predict_fn() to include custom logic.
Create a tarball (model.tar.gz) with all the model artefacts (incl. your custom inference code). It should be formatted as below.

model.tar.gz/
|- pytorch_model.bin
|- ....
|- code/
  |- inference.py
  |- requirements.txt

Finally, upload the tarball to S3 and pass the S3 URI to SageMaker when creating a model/endpoint.

There's a great notebook from Hugging Face covering this whole process. It's the best guide I've been able to find so far. If you copy it word-for-word, and only modify the inference.py script, you should be successful.

Here's an example of an inference.py I've used previously, as you can see, Hugging Face Pipelines work too!

from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
from DirectQuoteUtils import reformat
import torch
import os

def model_fn(model_dir):
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForTokenClassification.from_pretrained(model_dir)
    pipe = pipeline("ner", model=model, tokenizer=tokenizer)
    return pipe

def predict_fn(data, pipeline):
    pipe = pipeline
    outputs = []
    
    # FORMAT FOR MODEL INPUT:
    # {               # list of strings
    #     "inputs": ["Donald Trump is the president of the US", "Joe Biden is the United States president"]
    # }
    
    modelData = pipe(data['inputs'])
    
    for prediction in modelData:
        cleanPred = reformat(prediction)
        outputs.append(cleanPred)
        
    return {
        # "device": device, # handy to check if CUDA is being used
        "outputs": outputs
    }

AWS Sagemaker MultiModel endpoint additional dependencies

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-SAGEMAKER

Related Questions in HUGGINGFACE

Trending Questions

Popular # Hahtags

Popular Questions