We maintain a clone of our primary Git server for scanning and indexing content. Our "clone" is a collection of bare git repositories.
When we begin to execute one of our cycles, each bare repository is updated with a git fetch
operation. Any branches which received commits since the last scan are then git checkout
to a working directory. This allows us to have multiple copies of the code (work trees), each on a different branch, available for processing while maintaining a single copy of the git repository. The work trees are cleaned up after our processing completes.
We have some very active and large repositories where the checkout
operation can take upwards of thirty minutes. If there are ten branches to be scanned, they are checked out sequentially: git --work-tree=/path/to/repo.git --git-dir=/work/path/branch1 checkout -f branch1
, followed by git --work-tree=/path/to/repo.git --git-dir=/work/path/branch2 checkout -f branch2
, etc. (Note that we work with distinct commit IDs rather than branches, but the concept holds!) Sequential operation is required because git checkout creates an index.lock file in the bare repository. This can take (10 branches * 30 minutes) upwards of five hours.
Is there any way the checkout operations can be parallelized, bypassing the index.lock file?
The
git worktree add
command will allow parallel operation. When youthen
and
can be executed in parallel.