I'm digging into the caching of Docker buildx to try to debug an issue. I'm trying to figure out how, exactly, buildx checks if a layer is available in the local cache. Although I've searched fairly extensively, I can't seem to find any documentation on this.
Looking at the local cache files themselves, I see a bunch of files with hash names. My assumption is that it works as follows (assuming use of type=local,mode=max
):
- For each line in the Dockerfile, it uses some combination of parameters to calculate a SHA hash.
- It checks in the
--cache-from
directory to see if a file with that hash as the name exists - If it does exist, it uses that file as the layer and doesn't re-build anything (and copies that file to the
--cache-to
directory. - If it does not exist, it builds the layer and saves it as a file, with that hash as the name, in the
--cache-to
directory. - This results in an output cache with 1 file for each line in the Dockerfile.
So my questions are:
- Is my understanding of this process correct? Am I missing any key elements?
- For step (1) above, what are the "parameters" that it uses to calculate the hash? I would think it's the string value of the line itself, plus the value of any files that are copied by the line (e.g.
ADD
), but does it use anything else? e.g. the last-modified timestamp of any files that it copies?
My understanding is roughly along those lines. I'd need to check the code myself to know the specifics.
In general, caching of Dockerfile steps uses the following (this predates buildkit):
COPY --link
may be an exception to this). So if you copy a new file into the image, all remaining steps of that build stage are no longer found in the cache, docker has no way to generically know that a specific file doesn't affect some RUN steps.