How to adjust parents of a Git commit permanently in history

110 Views Asked by At

I know how to solve this using traditional tools at hand, but I want to understand the possibilities and see whether things can be done more effectively.

In this example - I want to add B as a second parent to C.

main   ∙∙∙A---C---D         ∙∙∙A---C---D  
                       ->         /   
feat   ∙∙∙--B               ∙∙∙--B   
  • I am looking for a "permanent" history rewrite solution,
  • without writing "virtual" replace refs or the likes.
  • C's and D's author (can be somebody else than me) & date must not change.
  • The solution must not rewrite unnecessary commits, so in this case it can only rewrite C and D.

In short, to permanently write the result of something like this into repo, so that it doesn't use replace refs, just pure simple commits:

git replace --graft $commit $parent1 $parent2

I remember studying this topic on Plumbing and Porcelain in git docs and I remember I spent too much time on it, but I have no recollection of what I learned. What can I try next?

3

There are 3 best solutions below

1
larsks On

For this task we will need:

  • git cat-file
  • sed
  • git hash-object
  • git update-ref

Let's create a sample repository. This script sets things up so that we have repeatable commit ids and we don't run into any issues caused by weird local git configurations:

#!/bin/bash

set -e

HOME=$PWD
GIT_AUTHOR_NAME=Alice
[email protected]
GIT_AUTHOR_DATE="2023-01-01 00:00:00"
GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME
GIT_COMMITTER_EMAIL=$GIT_AUTHOR_EMAIL
GIT_COMMITTER_DATE=$GIT_AUTHOR_DATE
export HOME GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL,DATE}

workdir="$(mktemp -d "$PWD/gitXXXXXX")"
trap 'cd /; rm -rf $workdir' EXIT

cd "$workdir"

git config --global init.defaultBranch main
git init

for x in A C D; do
    echo "file for commit $x" > file-$x
    git add file-$x
    git commit -m "$x"
done

git checkout --orphan feat
git reset
echo "file for commit B" > file-B
git add file-B
git commit -m 'B'
git checkout -f main

PS1="git$ " bash --norc

That gets us:

git$ git log --oneline
6352bde (HEAD -> main) D
25635b4 C
79a5602 A
git$ git log --oneline feat
db65aa0 (feat) B

We can use git cat-file -p to dump the structure of a commit. We want to add a new parent to commit C, which looks like:

git$ git cat-file -p 25635b4
tree dfa1779e5574c1b6f1c9c9071aa1a820b1e03680
parent 79a56022dc4511577b0281bb034b56e0352d2e36
author Alice <[email protected]> 1672549200 -0500
committer Alice <[email protected]> 1672549200 -0500

C

To make B a parent of this commit, we need to add a second parent line. We need the full commit id for commit B:

git$ git rev-parse feat
db65aa0d30cf551fdd25ad93d0c8e2f8da057572

We can add that as a parent of C using sed, like this:

git$ git cat-file -p 25635b4 | sed '/parent/ a\parent db65aa0d30cf551fdd25ad93d0c8e2f8da057572'
tree dfa1779e5574c1b6f1c9c9071aa1a820b1e03680
parent 79a56022dc4511577b0281bb034b56e0352d2e36
parent db65aa0d30cf551fdd25ad93d0c8e2f8da057572
author Alice <[email protected]> 1672549200 -0500
committer Alice <[email protected]> 1672549200 -0500

C

That looks right. Now we need to write that into the object database:

git$ git cat-file -p 25635b4 | sed '/parent/ a\parent db65aa0d30cf551fdd25ad93d0c8e2f8da057572' | git hash-object -t commit --stdin -w
a6db46299e550128fa8534dcc001f961ac4265c5

So now we have commit C' with commit id a6db46299e550128fa8534dcc001f961ac4265c5. We need to edit D to get D' with parent C', which is just a simple search/replace operation:

git$ git cat-file -p 6352bde  | sed 's/25635b41c4e003279c17c0cc50bf1e565b36ecfb/a6db46299e550128fa8534dcc001f961ac4265c5/' | git hash-object -t commit --stdin -w
96b1aea0c6c75f53b0ae45658b8ffdb3921560c0

Lastly, we need to update the main branch to point to D' as the new HEAD:

git$ git update-ref refs/heads/main 96b1aea0c6c75f53b0ae45658b8ffdb3921560c0

Now let's see what we have:

git$ git log --graph --pretty='%h (%s)%n' --abbrev-commit --date=relative --branches --all --decorate
* 96b1aea (D)
|
*   a6db462 (C)
|\
| |
| * db65aa0 (B)
|
* 79a5602 (A)

I think that's what you were after.


I wrote out all the steps here in detail which makes it look enormous compare to the solution from @ElpieKay, but when we distill it down to the crucial commands we get:

#!/bin/bash

git update-ref refs/heads/main "$(
    git cat-file -p "$C" |
    sed "/parent/ a\parent $B" |
    git hash-object -t commit --stdin -w
)"

git update-ref refs/heads/main "$(
    git cat-file -p "$D" |
    sed "/parent/ s/parent.*/parent $(git rev-parse main)/" |
    git hash-object -t commit --stdin -w
)"

Fill in $A through $D with the appropriate commit ids. If you're working with the sample repository created by the script earlier in the post, you can run (assuming you save the script to a file named reparent-w-hash-object.sh):

eval $(git log --oneline --branches --pretty="%s=%H") \
  sh reparent-w-hash-object.sh
4
ElpieKay On

In your example, we could use git commit-tree and git update-ref.

# Create the substitute of C
s=$(git log -1 --pretty=%B C | GIT_AUTHOR_NAME=$(git log -1 --pretty=%an C) \
    GIT_AUTHOR_DATE=$(git log -1 --pretty=%ad --date=iso C) \
    GIT_COMMITTER_NAME=$(git log -1 --pretty=%cn C) \
    GIT_COMMITTER_DATE=$(git log -1 --pretty=%cd --date=iso C) \
    git commit-tree -p A -p B -F - C^{tree})

# Update main
git update-ref refs/heads/main $s

In the same way to create the substitute of D

git update-ref refs/heads/main \
    $(git log -1 --pretty=%B D | GIT_AUTHOR_NAME=$(git log -1 --pretty=%an D) \
        GIT_AUTHOR_DATE=$(git log -1 --pretty=%ad --date=iso D) \
        GIT_COMMITTER_NAME=$(git log -1 --pretty=%cn D) \
        GIT_COMMITTER_DATE=$(git log -1 --pretty=%cd --date=iso D) \
        git commit-tree -p main -F - main^{tree})

When we create the substitute, we reuse the commit message, the author name and date, the committer name and date, and the tree object referenced by the commit. Only the parents are changed.

We get the commit message and pass it to stdin, and then it is captured by -F -.

The GIT_ environmental variables specify the author and committer.

-p A and -p B specify the parents. By their orders, A is the first parent and B is the second parent.

-F - reads the commit message from stdin.

C^{tree} means to reuse the tree object referenced by C, so that C and its substitute have the same directories and files.

Note that in this way, the substitute of C is a man-made commit. The contents of its files are not generated by the natural merge of A and B. We just reuse the contents of C's files. If you want a natural merge, use git merge to create the merge commit first and then rewrite the merge commit with the meta data of C.

0
LeGEC On

Also worth knowing :

if you have replaced commits, running any subcommand of git filter-branch or git filter-repo will "set in stone" these replacements:
all replaced commits and their descendants will be rewritten.

So running git filter-repo --partial (with no specific action) or git filter-branch --index-filter true will create actual commits matching the replacement rules you provided.

To avoid having such commands act on the complete history of your branch, you may want to narrow down the range of commits to rewrite, for example:

# if commit C is part of several branches, you need to name them all here
# (you can also move them manually afterwards, but it could be more cumbersome)
git filter-repo --partial --refs=A..HEAD