We have two different teams, each in its own location, working with git, each location having a reference repository. Each location has a access to an enterprise network, but the two networks cannot be directly connected (trust me, we asked): we can only exchange files. We would like to be able to sync the two locations regularly so that the work can be shared through the respective reference repositories.
The requirements:
- Exchanges must be allowed in either direction.
- We need to be able to work on some branches simultaneously from both sides, or at least recover from cases where this happened, even if we expect to work on separate branches most of the time. This implies an integration step may be necessary to handle the divergent work.
- Most tracking must happen automatically, such that manual intervention, and the risk of manipulation errors from same, is minimized (not that they would be fatal, but best to avoid finger-pointing: trust is limited). In particular, the single, moving tag example used in the git-bundle man page is laughable, as that will not scale even to a limited number of branches (we have dozens).
- The reference repositories may only be manipulated through remote push/pull and if necessary light administrative operations, both because they are under IT control, and because we want them to be always consistent, i.e. integration is done first, and only then are the changes from the other side published, together with the integration, on the local reference repository.
- We cannot send the whole repository (even tar-gzipped) each time: it's not only a bit big per se, but also all packages successively sent are kept in records because this is part of contractual commitments, and having N copies of the repository in there is quickly going to become unsustainable.
- All the necessary information must be stored in the local reference repository, so that any developer may perform the syncing steps, without depending on information stored in the local repository(ies) of a particular developer.
- Work with git, not against it, at least to as much extent as it is possible to do so. The weirder the workflow is, the more likely it is going to break because of a change in git or other unexpected condition.
Non-requirements:
- Handling more than two disconnected sites. Two is going to be challenging enough already.
- Nightly processing. Exchanges are going to be triggered and handled manually.
- Limited number or complexity of commands. If many intricate commands are necessary, so be it, we can always hide that complexity in a script.
- Crossing the offline syncs. That always means trouble, just like with streams. Ergo, we can assume offline sync operations are totally ordered, regardless of their directions, taking turns if necessary.
- Branch management details, etc. That is our internal business.
The solution I have so far is to use the
git bundle
command, relying on remote references to keep track of what the other location already has, with some involved steps I came up with to carry these remote references through push/pull. Let our location be called site-a and the remote location be called site-b.Generating a bundle to send to the remote location:
~/work$> git clone $LOCAL_REF_URL --mirror bundler
~/work$> cd bundler
~/work/bundler$> git bundle create ../bundle-site-a-$(date +%Y-%m-%d) --branches --tags --not --remotes=site-b
The bundler work repository may now be discarded.
Integrating a bundle from the remote location:
~/work$> git clone -n $LOCAL_REF_URL bundle-integration
~/work$> cd bundle-integration
~/work/bundle-integration$> git checkout --detach
~/work/bundle-integration$> git fetch origin 'refs/heads/*:refs/heads/*' 'refs/remotes/site-b/*:refs/remotes/site-b/*'
~/work/bundle-integration$> git remote add site-b ../bundle-site-b
~/work/bundle-integration$> git fetch --tags site-b 'refs/heads/*'
git fetch . 'refs/remotes/site-b/*:refs/heads/*'
to fast-forward the ones that can be in one fell swoop, thengit checkout $BRANCH && git merge site-b/$BRANCH
for the others: neither side of history can be rewritten. Also delete branches that the bundle took into account but no longer contains.git push --tags origin 'refs/heads/*:refs/heads/*' 'refs/remotes/site-b/*:refs/remotes/site-b/*' --prune
fully succeeds, return; we are done~/work/bundle-integration$> git fetch origin
(a regular one)git checkout $BRANCH && git merge origin/$BRANCH
idiom), except for your own merging work, which can be rebased if you preferThe bundle-integration work repository may now be discarded.
Notes: step 1 cannot just be a mirror clone, as --mirror does not merely presume --bare, it forces it, which is incompatible with the need to perform integrations later: even trivial (fast-forward) git merge operations require a non-bare repository. Step 3 is necessary in order to "park" the
HEAD
away from any branch, otherwise step 4 is going to fail if and when it tries to directly update the branch thatHEAD
is pointing to. Step 4 is necessary (it does not fetch any commit) as it will set up all the necessary references since the remote bundle may not necessarily contain all branches (it omits ones where it provides no update), while in the end we're going to prune branches from the origin based on our own branches, so we want to start with all the branches origin has; specifying the refspecs from this step as -c options to the initial clone instead does not appear to work. Step 5 is necessary so git knows to update the references inrefs/remotes/site-b/*
in step 6.Updating the remote tracking references, when the remote location has confirmed having been able to fetch the contents of a bundle sent to them:
This is done by following the steps from "Integrating a bundle from the remote location", except taking the sent bundle as if it was coming from the remote location; obviously no integration work is necessary in that case as the branches from our location are necessarily up-to-date with the information from the bundle.