How to determine when an upstream git repository has been modified - for creating backups

117 Views Asked by At

How to determine if changes have been commited since the previous git bundle was created without creating a working repository, looping thru every branch and recording every head revision?

One of the short falls I'm finding with GIT is proper backup support for use in the enterprise. The enterprise differs from open source development in that there is always 1) an authoritative repository and 2) a backup system handling very large amounts of data. Thus there is motivation to both 1) backup very frequently and 2) only run the backup process when there are new changes. My problem is finding a solution for #2.

I'm using git bundle to create my archives but I'm not finding a conclusive way to determine whether new changes have been commit-ed since the previous backup.

I've been trying to find a combination of options for git rev-list to list new commit ids since the last bundle, but have been unsuccessful. A query on this topic reveals a very nice backup script written using:

git -C "${path}" rev-parse --short=10 HEAD

to mark the bundle with a commit id. That solution inadequately describes a snapshot of a git repository as other branches may have been updated leaving the HEAD revision of an upstream repository unaltered.

I've looked at using --max-age=<lastbackup epoch>, but quickly found that its possible for a developer to push older changes after a backup has run, and since the dates for the commits do not change, the result is that they are older than the last backup date and thus a backup is not triggered.

The best approach I have so far is:

git -C ${repo} rev-list -a --branches ${prev_commit}..HEAD

which does capture new revisions from other branches, but will continue to report revisions on other branches even after a newer commit has been made to HEAD.

I have not started looking into incremental backups yet, but I can already see that in order to verify one, I would need to create and manage a working repository when I prefer to just maintain bare repositories on our server.

Also I'll note that I have not found an option to git branch to remove the "*" so it will just give me a clean list of branches for scripting.

What are other enterprises doing to backup their repositories?

1

There are 1 best solutions below

3
On

You can use git -C "${path}" rev-parse --short=10 --branches instead. Even it shows fatal: Needed a single revision in the end of the output, but it also can display the latest commit for each updated branches.

As you use git to do version control, you just need the git server or third party hosted server (such as github, bitbucket etc) to manage the different version. It’s really convenient and time saving, and you don’t need to concern what’s the version now. The advantage is the commit histories can't lost, so in this way you don't need to do archiving any more.