I'm working on a application that uses LayoutMV2 model, which uses Facebook AI’s Detectron2 package for its visual backbone. Both of these dependencies, as well as the application itself, require torch.
Both Detectron2 and the LayoutMV2 require torch as well as the application I'm working on itself.
My goal is to have a container image in which the application can run a training using a Nvidia graphics card and CUDA.
Detectron2 does not have pre-built wheels for the newer versions of CUDA and Pytorch, so I need to build Detectron2 from source. See this link for installation instructions.
To allow Detectron2 to make use of CUDA, it has to be build in an environment where CUDA development tools are present.
For this the nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 image provided by nvidia can be used.
However, my application does not require these development tools, as the necessary CUDA libraries that my application requires are already bundled with PyTorch.
This results in a large application image size, as the CUDA development tools are not removed after the Detectron2 build.
The documentation for these images states that the following about the devel tagged images: "These images are particularly useful for multi-stage builds."
Okay, so I can use a multi-stage build to build Detectron2 in the devel image and then copy the necessary files to a smaller image.
However, I'm not sure what files I need to copy and where to.
When I build Detectron2 from source using the devel image, the following files are created inside the cloned repository:
====================================================================================================
189 MB $ pip install --user -e detectron2_repo # buildkit
====================================================================================================
48 MB home/appuser/detectron2_repo/build/temp.linux-x86_64-3.8/home/appuser/detectron2_repo
25 MB home/appuser/detectron2_repo/build/lib.linux-x86_64-3.8/detectron2/_C.cpython-38-x86_64-linux-gnu.so
25 MB home/appuser/detectron2_repo/detectron2/_C.cpython-38-x86_64-linux-gnu.so
521 kB home/appuser/detectron2_repo/build/temp.linux-x86_64-3.8/.ninja_deps
(Created using dlayer)
Would it make sense to copy the detectron2_repo directory to the smaller image? Or should should I build a wheel and copy that to the smaller image? How would I go about doing that?
I would appreciate any guidance on the best approach to take for the multi-stage build and what specific files should be copied to the smaller image.
This is what I have come up with.
The
torch.txtanddetectron2.txtI generate usingpip-compilefrom pip-tools.requirements/base.inrequirements/torch.inrequirements/detectron2.inThe
Dockerfile