Conceptually how is `git revert` related to three way merge?

446 Views Asked by At

I am trying to understand how git revert uses three way merge from https://stackoverflow.com/a/37151159

Suppose the current branch is B, does command git revert C create a commit D, so that B is the result of three-way merge of C and D with respect to C~ ?

2

There are 2 best solutions below

0
On

What git revert does is to do a three-way merge with an unusually-chosen merge base. This is the same as what git cherry-pick does, except that the chosen merge base is different.

In all cases, we can draw out the commit graph:

...--o--o--C1--C2--...--o   <-- somebranch
         \
          o--o--L   <-- our-branch (HEAD)

or:

...--o--C1--C2--o--...--L   <-- our-branch (HEAD)

or similar (draw whatever your graph looks like).

You tell Git: cherry-pick C2 or revert C2. Your current commit is L, found via HEAD. Git now proceeds to do a three-way merge operation. What's unusual about this is that the merge base, called B below, is either C1 or C2, and the other commit, called R below, is also either C1 or C2—whichever wasn't the merge base. For git cherry-pick, B = C1 and R = C2. For git revert, B = C2 and R = C1.

How three-way merges work, in short but reasonably complete form

All Git merges are implemented the same way.1 We start2 with three commits:

  • There is a merge base commit B.
  • There is a left-hand-side or local or --ours commit L. The git mergetool code calls it "local", but most Git commands just call it HEAD or --ours.
  • There is a right-hand-side or remote or --theirs commit R. The git mergetool code calls it "remote", while git merge itself uses MERGE_HEAD.

The merge base for many real merges is obvious from the graph:

          o--...--L   <-- our-branch (HEAD)
         /
...--o--B
         \
          o--...--R   <-- their-branch

For a cherry-pick or a revert, commits B and R are forced to some particular commit. For instance, if you run git revert <hash>, B is the commit you identified and R is its parent:

...--o--R--B--o--...--L   <-- our-branch (HEAD) 

Now, with the three commits B, L, and R—or rather, their hash IDs—in hand, Git will, in effect, run two git diff operations:

  • git diff --find-renames B L
  • git diff --find-renames B R

The first diff finds files that are different between the base and the left hand side (including any renamed files across that gap). The second diff finds files that are different between the base and the right hand side, again including any renamed files.

Any files that are not changed on either side are the same in all three commits. The merge result is the (single) version of that file that all three commits share.

Any files that were changed on only one side, Git takes the version of the file from that side.

Any files that were changed on both sides, but to the same contents, Git can take either the L or the R copy. These two copies are by definition identical, so Git picks one (actually always L since it's more convenient—Git is doing all this work directly in the index, and this lets it avoid moving the L file out of slot zero in the first place!).

Last, for any files changed on both sides, Git attempts—and maybe succeeds, or may fails—to combine the two sets of changes. The combined changes get applied to the copy of the file that came from the base commit B. If the combining is successful, that's the result of the merge. Otherwise Git leaves its best effort at merging in the work-tree, and stops with a merge conflict.3 Adding -X ours or -X theirs tells Git: instead of stopping with a conflict, resolve this conflict by choosing the ours or theirs hunk from the diff. Note that this is the only case that actually has to populate the three index slots for the file, and then invoke the low-level merge code (or your merge driver from .gitattributes, if you set one).

A successful result is automatically committed as a merge commit by git merge, or as an ordinary commit by git cherry-pick or git revert, unless you tell Git not to commit the result. A failed (due to conflicts) merge stops, leaving a mess in the index and work-tree, which you must clean up.


1Git's so-called octopus merge still works like this, but is iterative, repeatedly merging multiple branch tips into the index without committing the result. This makes it a little bit special since the ours state is only in the index, rather than an actual commit. The other Git commands generally check that the index and HEAD commit match, except that git cherry-pick -n and git revert -n simply use the index as if it were a commit, the same way that octopus merge does. In the main answer text above, you can can think of the index's content as the ours commit: internally, Git just shifts all the stage-zero entries to stage-2 to make this happen.

2For a recursive merge invoked by git merge -s recursive or git merge, Git first finds the merge base for you. This may turn up more than one commit. If that does happen, Git merges the merge bases, using (an internal version of) git merge. This is the recursive part of the "recursive merge": merging the merge bases so as to come up with a single commit. That single commit is then the merge base for the outermost git merge.

If you use git merge -s resolve, and Git finds more than one merge base, Git chooses a simpler approach: it picks one of the merge bases at (what seems like) random (it's not really random—it just takes whichever one comes out most easily from its merge-base-finding algorithm—but it's not carefully controlled; there's no inherent reason to prefer any one candidate merge base to any other).

3For a merge conflict during the recursive (inner) merge that happens when merging two merge bases, Git simply commits the conflicted text. The result is not very pretty.

2
On

I don't know how you are visualizing it... but this is how I see it:

git revert C while you are standing on B is asking git to do a 3-way merge between B and C~ assuming (forcing git to think) the branching point of both revisions is C.

Definition: branching point would be, under normal circumstances, the last revision that is present on the history of both branches. After that revision, the histories of both branches don't share another common ancestor.