Does docker layer caching combined with pip install lead to nonreproducible images?

458 Views Asked by At

Given the following dockerfile:

FROM python:3.9
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
COPY ./app /code/app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]

When we run it multiple times on the same machine, my understanding is that for RUN pip install the cached docker layer will be used, unless we change requirements.txt. However, given a fresh machine without the layer cache and some new released package, the same Dockerfile will lead to different packages being installed, correct?

If yes, what is the best practice to ensure

  • reproducible builds
  • fast builds using docker layer caching
  • using the newest available packages ?

I could envision that using e.g. pip-compile --update from pip-tools could be helpful, but understand too little about how docker caches text files.

1

There are 1 best solutions below

1
On

In Docker, once a layer changes, all downstream layers have to be recreated as well.

There's probably a better source for this, but here is a source: https://learn.microsoft.com/en-us/visualstudio/docker/tutorials/image-building-best-practices.

I don't know whats in your requirements.txt but one reason why your generated Docker image might be different, is because by the time you change your requirements.txt x days/months have passed and some of the subrequirements in the requirements.txt have been updated (externally).

To prevent this, use pinned requirements. You can explicitly set your requirements by package==1.2.3.4 etc.

If you also want to pin the subrequirements, look at a tool like poetry to create a lock file to get an identical environment as you had when you were building it in the dev environment. https://python-poetry.org/

If you always want to generate the latest available packages, I dont understand why you would want to bake it into a docker image (perhaps you should do a daily build without layer caching).