I'm doing analytical work inside a "Lab" docker environment which I manage. I use Travis to build, tag and publish the lab image to a docker container registry (AWS ECR) and then always pull latest image when I start the container to do my analytical work. This ensures I'm always working inside the latest version of the Lab environment. Note: each time Travis publishes a new image, it tags it in ECR with the build git commit ID and latest.

For reproducibility of my analytical results, I would like my python code running inside the container to be able to record in its outputs an identifier that indicates the exact docker image being used. This would enable me to re-download that particular docker image many months/years later from ECR and/or find the git commit from which the docker image was built, run the code again, and (hopefully!) get the same results.

What is the most standard way of achieving this? Can I perhaps store the image digest as an environment variable inside the container?

3

There are 3 best solutions below

3
On BEST ANSWER

There's probably a couple of options, but it depends on how the image is built

Assuming the source code is cloned in CI, and from that source the image is built (so you're not cloning the source code in the Dockerfile), you can use a build-arg to "bake" that commit in the image as an environment variable;

In your Dockerfile, define a build-arg (ARG), and assign its value to an environment variable (ENV). It's needed to assign it to an ENV, because build-args (by design) are not persisted in the image itself (only available during build).

For example:

FROM busybox:latest
ARG GIT_COMMIT=HEAD
ENV GIT_COMMIT=${GIT_COMMIT}

I'm setting a default value, so that the variable contains something "useful" if the Dockerfile is built without passing a build-arg

Then, when building the image, pass the git commit as a build arg

git clone https://github.com/me/my-repo.git && cd my-repo

export GIT_COMMIT=$(git rev-parse --short --verify HEAD)

docker build -t lab:${GIT_COMMIT} --build-arg GIT_COMMIT=${GIT_COMMIT} .

When running the image, the GIT_COMMIT is available as environment variable.

If you want to pass a reference at runtime (when running the image) instead, you can pass a reference when running the image; for example, to pass the digest of the image that you're running;

docker pull lab:latest

export IMAGE_DIGEST=$(docker inspect --format '{{ (index .RepoDigests 0) }}' lab:latest)

docker run -it --rm -e IMAGE_DIGEST=${IMAGE_DIGEST} lab:latest
1
On

Append commit id to your image tag.

ex: imagename:dev-v1-bc4da47

where bc4da47 is last commit id

you can get last commit id by

git rev-parse --short HEAD
1
On

When you build the image, pass in a build argument with the git hash:

$ docker build --build-arg GIT_HASH=$(git rev-parse --short HEAD) -t yourimage .

And in your Dockerfile you should have a:

ARG GIT_HASH

You should now, I believe, have an environment variable with the git hash available to code running inside the resulting container.

Long version: https://pythonspeed.com/articles/identifying-images/