How to use packaged model tar.gz inside SageMaker Processing Job?

446 Views Asked by At

I am working on deploying a full ML pipeline for SageMaker and Airflow. I would like to separate training and processing part of the pipeline.

I have a question concerning the SageMakerProcessingOperator(source_code). This operator relies on create_processing_job() function. When using this operator, I would like to extend the base docker image used for processing in order to use an home-made script. Currently, the processing works fine when I push my container to aws ECR. However, I would prefer to use a part of the script stored inside my packaged model (with tar.gz format).

For training and registering the model, we can specify the image used to extend with sagemaker_submit_directory and SAGEMAKER_PROGRAM env variable (cf aws_doc). However it looks like it is not possible using the SageMakerProcessingOperator. Below is a extract of the config used in the operator, with no success yet.

"Environment": {
    "sagemaker_enable_cloudwatch_metrics": "false",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": f"{self.region_name}",
    "SAGEMAKER_SUBMIT_DIRECTORY": f"{self.train_code_path}",
    "SAGEMAKER_PROGRAM": f"{self.processing_entry_point}",
    "sagemaker_job_name": f"{self.process_job_name}",
},

Did anyone manage to use these parameters for Sagemaker create_processing_job() ? Or is it only limited to AWS ECR ?

1

There are 1 best solutions below

3
On

SageMaker Processing Job and SageMaker training job are different so the underlying architecture is different and we cannot combine both of them.