Sagemaker LLama2 endpoint without huggingface token

231 Views Asked by At

I have a model.tar.gz archive with the following files in it:

LICENSE.txt
README.md
Responsible-Use-Guide.pdf
USE_POLICY.md
checklist.chk
consolidated.00.pth
params.json
tokenizer.model
tokenizer_checklist.chk

this is from the model repo meta-llama/Llama-2-7b-chat. I want to be able to deploy an endpoint using my model archive without going through HuggingFace, no token, not using the hugging face library. Ideally pure boto3 / sagemaker.

Any help is welcome.

from sagemaker import Model

instance_type = "ml.g5.2xlarge"
endpoint_name = "ss-llama2-endpoint"

model_image = get_huggingface_llm_image_uri("lmi")
model = Model(
    image_uri=model_image,
    model_data=path_to_model_data,
    role=sagemaker.get_execution_role(),
)
model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name,
)

I expected this to create and endpoint and for me to be able to query it with {"inputs": "some user question here"}.

In reality - the endpoint doesn't even get created. Maybe I am using the wrong image or I am missing the inference script or something like that.

1

There are 1 best solutions below

0
On

THe huggingface containers expect the model to be be in huggingface format meaning it should contain config.json for the model. The format you are using is normal meta format which may not work. You should probably download the model and weights using meta-llama/Llama-2-7b-chat-hf ( hf on the model name indicates huggingface format) and try it out. Please do check the logs to get to the root of the issue.