When I enter git merge "my-most-up-to-date-branch"
I get the following error:
Problem is that I don't know where to start, at all. There are differences in .py files which I can edit by hand. But, also ones that can't fixed by hand like .db , .pyc files.
This is what I get when I enter git mergetool
:
Also, I am not sure whether I understand what happens if I enter "m" or "d" in the second picture. Project's most up-to-date version is in the branch "reset-password". How do I solve this issue?
ps: I am sure there is a way to handle this without using merge and making "reset-password" my new "master" branch. However, I really do want to be able to handle this problem by using merge so that I can be able to handle similar problems in my future professional life.
When you run
git merge name
, there are multiple possible outcomes:Git finds nothing suitable to merge and complains and never even starts the merge. That's not the case here.
Git starts the operation, and is able to complete it on its own, because one of two things is true:
In these cases, the merge is finished, and you can go on and do more things in Git. That's also not the case here.
Git starts the operation, but is unable to complete it. You're left with a mess.
Case 3 had already happened at some point, before you ran
git merge reset-password
. Once case 3 has happened, you must clean up the mess before you can proceed. Runninggit merge
again gets you the output you showed in your first image.(Note: you can get stuck in this same situation with
git cherry-pick
andgit revert
or any command that invokes these. Sincegit rebase
is performed by repeated cherry-pick operations, those too can leave you with merge conflicts. I am just guessing here that you had an earlier merge that never completed, based on the>M<
in your shell prompt. It seems that most setups use>R<
or>R
for an incomplete rebase.)Well, you sort of do, because you've tried
git mergetool
. But this is jumping right into the deep end of Git. Unfortunately, pretty much all approaches involve jumping into the deep end, here. Git sort of forces you to learn about the mess it leaves behind, and it's not simple.Things to know beforehand
You may already know some of this, but it's worth at least a quick scan just in case. First, Git is really all about commits. Commits are numbered, but Git (and you) will generally find these numbers by branch names, because of these facts about commits:
Commits are numbered, but the numbers are big and ugly. They look like, for instance,
e1cfff676549cdcd702cbac105468723ef2722f4
. These numbers might seem random, but in fact, they are cryptographic checksums of the contents of some internal Git object. Each Git commit is a unique internal object, and thus gets a unique checksum.1Commits contain two things: a full snapshot of every source file that Git knew about at the time you (or whoever) made the commit, and some metadata. The metadata include stuff like the name and email address of whoever made the commit. Crucially for Git, the metadata also include the raw hash ID of the previous commit, or for a merge commit, two or more previous commits. Git calls the previous commit the parent (and by implication, the commit itself is therefore a child of that parent).
The fact that the hash IDs are checksums means nothing about any commit can ever be changed. All commits are completely read-only. The files inside each commit are also read-only; to save space, they're compressed and stored in a Git-only format, with de-duplication.
There are some important consequences of these three points that we'll go over extremely fast here:
Commits form chains. Since it's the child commit that holds the parent's hash ID—it has to be; the child's hash ID isn't predictable when we make the parent—these chains point backwards.
A branch name simply holds the hash ID of the last commit in some chain. However, being the last commit in some chain doesn't mean there cannot be more commits after this point: another branch name can point to a later commit.
Many commits are on multiple branches. The very first commit in a repository, which has no parent because there's no commit before it, is pretty commonly on every branch. (The only way for it not to be on every branch is to have more than one of these "first" or root commits. We won't look at how this can come about, here.)
Because the files inside a commit are read-only, the files that you work on (or with) are not in a commit. In an important sense, they are not in the repository itself at all.
The sections below aren't about merging at all, yet. We'll get to that in a larger heading, in a bit.
1Pay no attention to the pigeonhole principle here, or see How does the newly found SHA-1 collision affect Git?
Extracting a commit: your work-tree
Let's expand a bit on that last bullet point. To get any actual work done, you need to get files out of some commit. Git will do this by extracting the frozen, compressed, and de-duplicated files (which sometimes aren't normal OS files at all, and which all have hash-ID names internally) into regular everyday files, putting those into a work area. This work area is not inside the repository.2
Git calls this work area your working tree or work-tree. Since this area is yours, you can create other files and directories/folders here, if you like. The files that Git knows about are, at least initially, those that Git just extracted from some existing commit. If you use the OS to create additional files, Git doesn't know about them, although Git will normally take care not to clobber them by accident.3
Pretty much all version control systems work like this: there are committed files, which are saved for all time,4 and some more-temporary ones that you can actually work on/with. This part, most people don't find confusing at all. Most other version control systems stop here, but Git being Git, it doesn't.
2The repository itself is typically stored in a hidden
.git
directory at the top level of the work area. That is, the repository is in the work-tree, rather than the other way around! This is not always a sensible arrangement, and submodule repositories are normally moved out of the way, in modern Git, lest this part of your work-tree get removed.3Listing a file in a
.gitignore
sometimes gives Git permission to clobber it, and some Git commands, such asgit clean
, are supposed to destroy such files anyway. So this is not a total guarantee of safety. But in general, you can create files in your work-tree, and not have Git ruin them. You'll see complaints from Git now and then about some work-tree file being in the way of agit checkout
orgit merge
operation: Git is just telling you Hey, I found this file of yours, and if I overwrite it now, from a committed file, I'll be clobbering your data, so maybe you should move it out of the way first.4Or saved for as long as you don't tell the system to forget that commit, or whatever. The details of this vary, quite a lot, from one version control system to another.
Making new commits: Git's index
In other version control systems (VCSes), you check out some commit, and now you have a bunch of useful files. You make changes to those files, and when you are ready, you tell the VCS: commit these. It goes and finds what you did, and commits that. Some of these VCSes can be excruciatingly slow here. Git tends to be blazing-fast. It gets this speed at a price. That price has useful (to you) side benefits, but it's definitely confusing, and it's time to learn all about it.
Instead of just having two copies of each file, Git stores three. One of the three is the frozen (and de-duplicated) file in the current commit. You picked out some commit to work on, so that commit—with its big ugly hash ID—is the current commit, and that commit has a snapshot of all files.
At the opposite end, as it were, Git has copied all of those files out of the commit, into your work-tree. These are ordinary everyday files that you can do anything with.
Between these two copies, though, Git keeps a third "copy". The word "copy" is in quotes here because this third one is in the frozen form, and is pre-de-duplicated. Initially, all of these match the copies in the commit. This extra copy lives in something that Git calls the index, or the staging area, or sometimes—rarely these days—the cache. All three of these names are for the same thing. It has three names perhaps because index doesn't mean anything, and cache is too specific: the name staging area reflects its role.
When you go to make a new commit, Git uses the ready-to-go files that are in Git's index. Since they're in the right format, Git can make a new commit very quickly. But this means that if you change your copy in your work-tree, you have to tell Git to replace the index copy. The copy in the index is in the frozen format but isn't in a commit and therefore is not actually frozen.
The
git add
command is how you do all this. Whatgit add file
really means is make the index copy offile
match the work-tree copy. Git will replace the old index copy with a new one, compressing and de-duplicating the file atgit add
time, to make it ready to be committed. This means that instead ofgit commit
being slow, it'sgit add
that's slow—but you only have to do it on files that you changed, so it's not really that slow.All of this, in turn, means that what's in the index—or staging area—is, in effect, your proposed next commit. Git has filled it in from the current commit, when Git extracted that commit. Git copied the commit to both Git's index and your work-tree. Now that you've changed stuff, or maybe added or even removed some files, you must update Git's index to match. You do this with
git add
, orgit rm
if you want to just remove stuff. This updates Git's index, and hence your proposed next commit.Making new commits: updating a branch name
Before we move on to how merge works, let's take a moment to observe the process of making regular everyday non-merge commits—commits with just one parent, in other words. We start with a simple linear chain of commits, ending with some particular last commit with a hash ID:
Here
H
stands in for the actual hash ID of the last commit in the chain. CommitH
holds a snapshot and metadata. Git can find the commit by its hash ID, and the hash ID is in the namesomebranch
. In the metadata for commitH
, Git can find the hash ID of earlier—parent—commitG
, so usingsomebranch
to findH
lets Git findG
. CommitG
of course has a snapshot and metadata, and the metadata include the hash ID of its parentF
. This has a snapshot and a hash ID again. So given just the branch name, Git can find all the commits.Let's make a second branch name that points to the same commit:
We're still using commit
H
. The(HEAD)
here tells us that we're using the namesomebranch
to find commitH
. If yougit checkout anotherbranch
, we'll start using the nameanotherbranch
instead, but still find commitH
:If you now modify some files and
git add
them to put the updated files into Git's index, you can now rungit commit
to make a new commit. Git will:I
.There's one more step, but let's draw commit
I
now:Now let's add the branch names, after we note that the last step for
git commit
is that Git writes the new hash ID into the current branch name—the one with the attachedHEAD
:Now commits up through
H
are on both branches, and new commitI
is only onanotherbranch
.Merging
We are now ready to tackle Git's merge operation. Let's consider, first, these facts:
We start with a situation like this:
That is, the name
branch1
selects some commit—which we'll callJ
—and the namebranch2
selects some other commit that we'll callL
. The one we are using right now isJ
: that's what's in Git's index and in our work-tree.When we run
git merge branch2
, Git usesHEAD
to locate our commitJ
, and uses the namebranch2
that we gave as an argument to locate their commitL
. But now Git needs to figure out what we changed and what they changed. That means Git has to find some earlier commit.The right earlier commit is not always obvious, but what Git needs is a commit that is on both branches. Commit
H
is on both branches; so is commitG
, and anything earlier. It kind of stands to reason, though, that the best commit is probably the one "closest to the ends": that is, commitH
is "better" than commitG
, because comparing the snapshot inH
against either later commit will probably find fewer changes than comparing the snapshot inG
, or anything earlier.We call this "right commit" the merge base, and in any case, Git finds the merge base on its own here. In this easy case, it's easy to see that Git will pick commit
H
. In more complex graphs, usinggit merge-base --all
may be the only sane way to see what Git is picking.5To find what we changed, Git now runs, in effect:
A very similar command finds what they changed:
Git also, at this point, actually reads all three commits into the index.
5Git uses a lowest common ancestor algorithm here. When applied to a Directed Acyclic Graph, there may be more than one LCA. The
--all
togit merge-base --all
tells this command to print out all LCAs. Different merge strategies may use just one merge base, or all of them; we won't go into the details here.Merging really takes place in the index
Earlier we saw that the index had a copy of each file. This is the normal state for the index, but during a merge, the index actually expands. Instead of one copy, it holds three:
That is, if the merge base, our commit, and their commit all have a
README.md
file, the index now has threeREADME.md
files in it. We can name these using a digit and some colons, with some Git commands,6 e.g.:This repeats for every file in the three commits. Some of the commits might not have all the file names, and the
git diff --find-renames
above might find that, from commitH
toL
, they renamed some file, for instance; in this case the index entries are a little trickier. Or perhaps we or they deleted a file, or added a whole new file, in which case there's no slot-1 entry but there is a slot-2 or slot-3 entry. You have these cases, so we can't ignore them. But they're a little more complicated, so for now, we will ignore them. The rest is pretty straightforward:If Git was able, from the above, to figure out which version to use, Git just moves that version from these nonzero numbered slots to slot number zero, and erases the higher-numbered slots. A slot-zero entry is the normal "this is the file, ready to be committed" copy. So that file is now resolved. Git puts the chosen copy into your work-tree as well.
If not, Git goes on to try a low-level merge of the file.7
6Most Git commands that can take a hash ID can take names that Git resolves into a hash ID. This resolution is done through the rules outlined in the gitrevisions documentation. So
git rev-parse :1:README.md
prints out the internal blob hash ID for that file. When usinggit show
orgit cat-file -p
, you can give it either the hash ID, or the name; they'll run the name through an internal rev-parse as needed.7You can specify a merge driver instead of letting Git use its built in one. This also gets somewhat complicated.
Low-level merging is normally done diff-hunk-by-diff-hunk
Suppose that we have three different versions of
run.py
in the index, and the diff from base to ours says to make a change to line 42, while the diff from base to theirs says to make a different change to line 54. Git will simply take both changes and apply them to the merge base copy of the file.If we and they changed the same line(s), Git will compare what we both used as the new replacement(s) for them. If our replacements match, Git will take one copy of this change.
If we and they changed the same lines but to different text, Git will declare a merge conflict in this file, and will arrange for the merge to stop in the middle. The extended (
-X
) options can tell Git not to stop after all (by telling it to favor ours or theirs), but we'll skip over these.If there are no merge conflicts after combining our changes and their changes, Git will, as usual, put the result into index slot zero and your work-tree. This file is also resolved.
If Git isn't able to resolve the conflict, the low level merge code will write its best-effort at merging the three files to your work-tree. (What happens to Git's index, well, we'll leave that for the next section.) The work-tree file will use the combined changes wherever they didn't conflict, and where they did, will contain lines from both "sides" of the merge. If you set
merge.conflictStyle
todiff3
, the conflicted region will include the corresponding lines from the merge-base version of the file. I like to set this option always; I find the resulting conflicts easier to read.High level conflicts, also known as tree conflicts
In the section above, I talked about how Git handles conflicts within the three versions of some file, where there is a merge base copy, the
--ours
copy, and the--theirs
copy, and all three differ. But let's see what happens with these cases:Suppose they delete a file and we don't do anything to it. What should Git do with this? Git's answer is take the deletion: Git keeps the file deleted in the merge result, by emptying out all index slots, including slot zero, and making sure that the file isn't there in your work-tree.
Suppose we delete a file and they don't do anything to it. Git handles this the same way.
Suppose we, or they, delete a file, and they or we—the other side—modify the file. Git's answer is to declare a merge conflict and just leave two of the three copies in the index. Git calls this a modify/delete conflict.
Suppose we rename a file (without changing its content), and they don't change it, or do change it but don't rename it. Git's answer is to combine both changes: take their changes if any, and use our new file name. The same applies of they rename it and we don't. If we both modify the file, and the low level code can combine the content changes, Git resolves the file by taking both the rename and the combined content change.
If we both rename the file, but to different new names, Git calls this a rename/rename conflict.
If we both create all-new files, with different content but the same name, Git calls this an add/add conflict.
These conflicts that involve file names or entire file creation/deletion are all high level or tree conflicts, because they don't involve low-level content conflicts. We can even get both high and low level conflicts, e.g., with a rename/rename conflict plus a low level conflict; but the main point here is that if we do get one of these high level conflicts, the extended (
-X ours
and-X theirs
) options have no effect: those options are only handled by the low-level merge code.8In any case, if Git does stop with a merge conflict, it leaves the nonzero slot number entries in its index. This leaves the two or three input files available to commands like
git mergetool
, and leaves enough traces forgit mergetool
to diagnose high level conflicts such as modify/delete conflicts.8There may, in the future, be some fancier high level conflict handlers that do allow some
-X
options. But today there aren't.Your job: clean up the mess
We now know what kind of mess Git leaves behind:
Your job is to finish the merge. You may do this any way you like.
You don't have to use the higher-numbered index entries, but if you want to,
git mergetool
gives you a convenient way to access them, that does not require fumbling around withgit show :1:file.ext
,git show :2:file.ext
, andgit show :3:file.ext
and a lot of temporary files:git mergetool
does that for you.You don't have to use the work-tree copies of the files, with their partial merges.
You do have to run
git add
orgit rm
, butgit mergetool
can do that for you too. To mark the conflict resolved, you will either remove the index copies entirely—meaning that the final commit won't have the file at all—or write, to index slot zero, the correct merge result.Your particular case
In your particular case, you have
__pycache/*.pyc
files listed (four of them) and two other files,app.db
andrun.py
.The
__pycache__
files should almost never be in a Git repository. Your merge conflict for one of them shows that one side of the merge—the--ours
side, i.e., merge base vsHEAD
—had modified the file, while the other side of the merge had removed the file, in the twogit diff
s thatgit merge
ran.The correct resolution here would be to take their change, i.e., to remove the file entirely. For
git mergetool
, then, the answer would bed
: use the deletion, rather than keeping your modified file.For
app.db
, the correct result is probably not your file, but might not be their file either. The correct result might be some combination of both files. If the database is binary, Git's simple newline-based text substitution rules, for combining twogit diff
s and applying the combined changes to the merge base copy, simply doesn't work at all. It's up to you how to produce the correct finalapp.db
copy, but let's assume there is a magic command that can read bothapp.db
input files and produce the right result. You might run:which combines them and writes the correct combined data to
app.db
. Now that your work-tree copy is what you want to commit, you would just run:This erases the three numbered slots (
:1:app.db
,:2:app.db
, and:3:app.db
are all gone) and copies (and compresses and freezes and de-duplicates) the currentapp.db
into index-slot-zero.For
run.py
, perhaps you should look at their file and your file, and perhaps the merge base version as well, in an editor or merge tool or whatever you will use to figure out what the correct merge result is. Or perhaps the work-tree copy, with Git's attempt at merging, is sufficient for you to figure out what should be in that file. Thegit mergetool
command is likely to offer you a way to run some merge tool over all three inputs. I prefer to just editrun.py
in the editor and figure it out (using the three sections from mydiff3
setting formerge.conflictStyle
) in most cases.If you have
git mergetool
run a tool, then:git mergetool
knows a lot about this tool and can trust it to exit with a status code that says "all merged, use the result" or "not merged, don't use" andgit mergetool
will rungit add
for you or not, correctly; orgit mergetool
doesn't know enough about the tool, but will run it and then ask you if it should use the result.If
git mergetool
uses the result, it will do its owngit add run.py
. If not, you still have the three copies in the index; you can openrun.py
in your favorite editor, look it over, and decide whether it's all correct or needs more changes. You can run tests, and so on.Even if
git mergetool
does add the file, you can still look it over and run tests. Resolving the file just means getting the index set up so that Git thinks the merge is done.Committing the final merge
If Git thinks it did the merge on its own, Git will make a new merge commit:
This merge commit has a hash ID, like any commit. It has a snapshot, like any commit. It has metadata, like any commit—with one difference: it lists commit
J
as its first parent, so thatM
points back toJ
, but then it also lists commitL
as its second parent, so thatM
also points back toL
. Now commitsH-I-J-M
are onbranch1
(plus earlier commits) but so areH-K-L-M
(plus earlier commits). So now all the commits that were only onbranch2
before, are on both branches. New commitM
is only onbranch1
, and—as usual—is the new tip of the branch: Git wroteM
's hash ID into the namebranch1
.If Git doesn't make the merge commit on its own due to merge conflicts, you:
git merge --continue
orgit commit
,9and Git will now make merge commit
M
as before, with the two parents. Or, you can run:to erase the index (well, reset it to match
J
, really) and put your work-tree back to matching commitJ
, and you'll be back in the situation you were in before you started the merge. (Any work you did to resolve the merge is gone, so be a bit careful here!)9All
git merge --continue
really does is make sure that you're in the middle of a merge, then rungit commit
. So it's a bit of safety, in that it won't do anything if you think you're in a conflicted merge, but somehow you aborted it earlier, or finished it. Usually in that situationgit commit
will tell you there's nothing to commit, too, so this is rarely important.