Solving git upstream rebase "hard case"

271 Views Asked by At

Wouldn't the git upstream rebase "hard case" problem be solved if each branch carried a reference to its initial base commit (say branch@{base}) ?

This initial base commit (stored under branch.<name>.base in the config file, for example) would first be the one that branch was pointing to when it was initially created.

Then, any git rebase new_base_commit branch could actually do a git rebase --onto new_base_commit branch@{base} branch, before updating branch@{base} to new_base_commit.

It would simply automate the "hard case" resolution scenario of the documentation.

I suppose that if such a simple solution is not already implemented, there should be good reasons not to. And since I can't see any, it must mean that I misunderstood something.

So if there are, what are those reasons?


EDIT: Reading bk2204's answer made me realize that this behavior would be useful and expected only for the special use case of tracking branches (which I should've realized sooner since it's about upstream rebase), so the initial base should be recorded only for tracking branches, and used only for commands using an implicit @{upstream}, like git rebase without arguments.


EDIT: I just discovered that actually, git pull --rebase and git rebase already do something similar using the algorithm of git merge-base --fork-point, but the latter uses the reflog, which can be garbage-collected, to compute the fork point on the fly.

So I still wonder: why not simply store it next to branch.<name>.remote and branch.<name>.merge instead?

For example, when the user starts tracking another branch*, the fork point could be computed with git merge-base --fork-point upstream local and stored under git config branch.local.forkPoint (or any other name), along with git config branch.local.remote and git config branch.local.merge.
Then, when the user performs a git pull --rebase or a git rebase, it could do**:

git rebase --onto local@{upstream} `git config branch.local.forkPoint` local

And if the user tries to perform a git pull or a git merge, it could first check that local@{upstream} weren't rebased, with:

git merge-base --is-ancestor `git config branch.local.forkPoint` local@{upstream}

If it were rebased, it could abort, and suggest to do a rebase instead or to write the full merge command to force it (for example).


EDIT: I think that, to be able to handle correctly the case described in "The Perils of Rebasing" in this page of the doc, when "synchronizing" the branch to its upstream with a merge instead of a rebase, the last "synchronization point" should be checked to verify that the upstream weren't rebased since then either.

So each git pull or git merge should also store the merge parent commit from the upstream branch somewhere (like branch.local.lastSyncPoint maybe) after applying the merge. Before applying the merge, it should also check that:

git merge-base --is-ancestor `git config branch.local.lastSyncPoint` local@{upstream}

Actually, it could make the check on the fork point useless.


EDIT: Moreover, I think that a rebase should discard all commits reachable from the last "synchronization point" that aren't included in the (rebased) upstream (local@{upstream}..`git config branch.local.lastSyncPoint` ). It would make it work according to expectations in the case of discarded commits.


* with git switch --create local --track upstream or git checkout -b local upstream or git branch --track local upstream or git branch --set-upstream-to upstream local

** instead of an on-the-fly:

git rebase --onto local@{upstream} `git merge-base --fork-point local@{upstream} local` local
2

There are 2 best solutions below

3
On

First, Git doesn't track where a branch "begins" in the way you're thinking of. A branch is merely a pointer to a commit, and each commit contains a pointer to one or more previous commits. So if I do something like git checkout -b topic-branch, Git doesn't record the commit from which this branch was created, and that commit isn't special in any way.

Secondly, Git doesn't prioritize any one branch over another as being special. Creating a new branch from an existing one doesn't mean that the original branch is special or different from the one I just created, nor does Git assume I will merge my new branch into my old one. For example, instead of having a single main branch, I may have branches named after versions: v1, v2, v3, etc. So creating v4 from v3 isn't any different than creating topic-branch from the main branch. There'd be no point in tracking the base for v4 because it will never merge into v3.

Third, Git specifically does not track branches as a concept. It is intentional that commits don't include branch names or fork points. Branches are intended to be lightweight references to existing commits. It's intended that two branches can point to the same commit, and this is important for pack efficiency. If I wanted to create a branch major-feature that included and extended topic-branch that was in turn based off main, I'd either have to expose the history in my commits that I turned a small topic into a major feature or I'd have to rewrite all the commits in major-feature.

So while tracking this information would help some use cases, it would add a bunch of uninteresting metadata for a lot of other cases and would add a bunch of complexity.

4
On

First : there wouldn't be a single branch@base for a branch :
The "base" would not be the same e.g. for git rebase master and git rebase develop.

Second : you would still have to analyze the list of commits :
Suppose another developer (let's call him Dave) committed to branch develop ? What if Dave fixed the same bug as yours in one of his commits ? that he cherry-picked one of your commits ?

git focuses on storing content. When actions such as "merge" or "rebase" occur, it applies rules that "generally work" (and work well), but validating that the content is correct is always left to the user.