Offline syncing of locally-central git repositories

2.2k Views Asked by At

We have two different teams, each in its own location, working with git, each location having a reference repository. Each location has a access to an enterprise network, but the two networks cannot be directly connected (trust me, we asked): we can only exchange files. We would like to be able to sync the two locations regularly so that the work can be shared through the respective reference repositories.

The requirements:

  • Exchanges must be allowed in either direction.
  • We need to be able to work on some branches simultaneously from both sides, or at least recover from cases where this happened, even if we expect to work on separate branches most of the time. This implies an integration step may be necessary to handle the divergent work.
  • Most tracking must happen automatically, such that manual intervention, and the risk of manipulation errors from same, is minimized (not that they would be fatal, but best to avoid finger-pointing: trust is limited). In particular, the single, moving tag example used in the git-bundle man page is laughable, as that will not scale even to a limited number of branches (we have dozens).
  • The reference repositories may only be manipulated through remote push/pull and if necessary light administrative operations, both because they are under IT control, and because we want them to be always consistent, i.e. integration is done first, and only then are the changes from the other side published, together with the integration, on the local reference repository.
  • We cannot send the whole repository (even tar-gzipped) each time: it's not only a bit big per se, but also all packages successively sent are kept in records because this is part of contractual commitments, and having N copies of the repository in there is quickly going to become unsustainable.
  • All the necessary information must be stored in the local reference repository, so that any developer may perform the syncing steps, without depending on information stored in the local repository(ies) of a particular developer.
  • Work with git, not against it, at least to as much extent as it is possible to do so. The weirder the workflow is, the more likely it is going to break because of a change in git or other unexpected condition.

Non-requirements:

  • Handling more than two disconnected sites. Two is going to be challenging enough already.
  • Nightly processing. Exchanges are going to be triggered and handled manually.
  • Limited number or complexity of commands. If many intricate commands are necessary, so be it, we can always hide that complexity in a script.
  • Crossing the offline syncs. That always means trouble, just like with streams. Ergo, we can assume offline sync operations are totally ordered, regardless of their directions, taking turns if necessary.
  • Branch management details, etc. That is our internal business.
1

There are 1 best solutions below

7
On

The solution I have so far is to use the git bundle command, relying on remote references to keep track of what the other location already has, with some involved steps I came up with to carry these remote references through push/pull. Let our location be called site-a and the remote location be called site-b.

  • Generating a bundle to send to the remote location:

    1. ~/work$> git clone $LOCAL_REF_URL --mirror bundler
    2. ~/work$> cd bundler
    3. ~/work/bundler$> git bundle create ../bundle-site-a-$(date +%Y-%m-%d) --branches --tags --not --remotes=site-b

    The bundler work repository may now be discarded.

  • Integrating a bundle from the remote location:

    1. ~/work$> git clone -n $LOCAL_REF_URL bundle-integration
    2. ~/work$> cd bundle-integration
    3. ~/work/bundle-integration$> git checkout --detach
    4. ~/work/bundle-integration$> git fetch origin 'refs/heads/*:refs/heads/*' 'refs/remotes/site-b/*:refs/remotes/site-b/*'
    5. ~/work/bundle-integration$> git remote add site-b ../bundle-site-b
    6. ~/work/bundle-integration$> git fetch --tags site-b 'refs/heads/*'
    7. At this point the fetch told which remote site-b branches were updated with info from the bundle, so insert here the work necessary to integrate the ones that have corresponding branches in our location; first a git fetch . 'refs/remotes/site-b/*:refs/heads/*' to fast-forward the ones that can be in one fell swoop, then git checkout $BRANCH && git merge site-b/$BRANCH for the others: neither side of history can be rewritten. Also delete branches that the bundle took into account but no longer contains.
    8. If git push --tags origin 'refs/heads/*:refs/heads/*' 'refs/remotes/site-b/*:refs/remotes/site-b/*' --prune fully succeeds, return; we are done
    9. ~/work/bundle-integration$> git fetch origin (a regular one)
    10. Take into account work done on your location that happened while you were busy performing the previous steps; that still has to be done with merge (though in the more usual git checkout $BRANCH && git merge origin/$BRANCH idiom), except for your own merging work, which can be rebased if you prefer
    11. goto 8

    The bundle-integration work repository may now be discarded.

    Notes: step 1 cannot just be a mirror clone, as --mirror does not merely presume --bare, it forces it, which is incompatible with the need to perform integrations later: even trivial (fast-forward) git merge operations require a non-bare repository. Step 3 is necessary in order to "park" the HEAD away from any branch, otherwise step 4 is going to fail if and when it tries to directly update the branch that HEAD is pointing to. Step 4 is necessary (it does not fetch any commit) as it will set up all the necessary references since the remote bundle may not necessarily contain all branches (it omits ones where it provides no update), while in the end we're going to prune branches from the origin based on our own branches, so we want to start with all the branches origin has; specifying the refspecs from this step as -c options to the initial clone instead does not appear to work. Step 5 is necessary so git knows to update the references in refs/remotes/site-b/* in step 6.

  • Updating the remote tracking references, when the remote location has confirmed having been able to fetch the contents of a bundle sent to them:

    This is done by following the steps from "Integrating a bundle from the remote location", except taking the sent bundle as if it was coming from the remote location; obviously no integration work is necessary in that case as the branches from our location are necessarily up-to-date with the information from the bundle.