How to build docker images for multiple Sagemaker training applications?

201 Views Asked by At

Given one ML repository contains 3 training applications, briefly,

root
  |__Dockerfile
  |__requirements.txt (contains **heavy dependencies**, e.g., numpy, sklearn, etc. needed for all 3 apps)
  |__app_0
  |    |__training_0.py
  |    |__Dockerfile0
  |__app_1
  |    |__training_1.py
  |    |__Dockerfile1
  |__app_2
  |    |__training_2.py
  |    |__Dockerfile2
  |__heavy_utils
       |__utils.py

There are two approaches to build app_0, app_1 and app_2.

  1. One container for multiple apps - Build one container using the Dockerfile at root location. There will be some COPY commands at the end of the Dockerfile,
COPY app_0 .
COPY app_1 .
COPY app_2 .
  1. Multiple containers for multiple apps - Build multiple containers using individual Dockerfile$i inside app_$i.

I tried both approaches with pros and cons.

  1. One container for multiple apps

Pros: When uploading the image to AWS ECR, The file size is optimized as all 3 apps share some dependencies. Cons: When I plug in the container to Sagemaker training jobs, Sagemaker cannot recognize all 3 apps because docker building only allows one ENTRYPOINT in Dockerfile.

  1. Multiple containers for multiple apps

Pros: I can give different ECR images to Sagemaker training jobs with individual ENTRYPOINT specified. Cons: Duplicate dependencies across those ECR images.

I'd like to learn,

  1. Which one is more conventional or if any other better practice?
  2. Can I specify a custom ENTRYPOINT for Sagemaker training job (like the processing job) after the docker has been built? Specifically, I'm using Sagemaker SDK (sagemaker.estimator.Estimator) to build a Sagemaker pipeline. AFAIK, the entry_point option is only effective outside the container, i.e., run an external script from local or S3, which has a different behavior than entrypoint in sagemaker.processing.Processor.
0

There are 0 best solutions below