Git - remove all history prior to a specific commit

6.2k Views Asked by At

I use git for various projects (personal repositories only), and I want to do some housekeeping.

I have a downloaded git project tree that has a large history of commits. After downloading I made a few more myself. However, I do not need anything apart from the latest commit at the time I downloaded it, and the subsequent commits that I made. All the prior commits take up a lot of space, and I'd like to get rid of them.

What I should have done is delete the .git folder after download and create a new personal repository going forward - but I didn't.

So my question is this: can I clean up the repository so that everything prior to commit X is removed, as if it had never existed, but so that subsequent commits are maintained? If so, how? Also if possible, if there were multiple branches at that time, can I remove other branches also?

(Not sure if this is possible as I think one of git's claims is how hard it is to lose old data by mistake).

4

There are 4 best solutions below

3
Antonio Petricca On

I suggest you to squash your local commits by:

git log --oneline

# Write down the hash commit prior to your first commit

git rebase -i <commit-hash>

# Now a text editor will open, so change **pick** into **squash** for the second commit and following, then save and exit editor...

Now, all your new commits will be merged into your latest one.

You are ready to push it.

Here a short tutorial.

0
Roland Smith On

This is what I tested;

  • Make a backup of your repo first.
  • Find the oldest commit (e.g. with git log --reverse).
  • Run git rebase -i <oldest-commit>, and mark all commits except those you want to keep as drop.
  • Remove all remotes (e.g; git remote remove origin).
  • Run git reflog expire --all --expire=now.
  • Run git gc --aggressive.

If you run git fsck before and after these steps, you should see that the number of objects is significantly reduced.

4
TTT On

I have a downloaded git project tree that has a large history of commits. After downloading I made a few more myself.

Since you've only a made "a few more" that you wish to keep, I'm going to assume your "new" history is linear. If that's the case, then this is extremely easy to do. For this example we'll assume the branch you want to keep is called main:

# make sure your status is clean
git status # verify it's "nothing to commit, working tree clean"

# Figure out your first commit ID
git log --reverse # let's call the first result ID <repo-root-commit-id>

# Figure out the commit you started from (parent of your first new commit)
git log # let's call the starting commit X, as stated in the question

# Make a new temp branch from the commit you started from (commit X)
git switch -c temp-branch X

# soft reset to the repo root commit
git reset --soft <repo-root-commit-id>

# Now the entire history from initial commit through X will be staged
# Make all of this a single commit
git commit -m "Squash repo history into a single commit"

# Now rebase all of your new commits onto the temp branch
git rebase X main --onto temp-branch

# Now your rewritten main branch is as desired, delete the temp branch
git branch -d temp-branch

Since your goal is to recover space used by the old history, you can remove your remote, delete all local branches except main, and either garbage collect now or re-clone your new repo to another place. For example, those links are summarized here:

# Remove the remote:
git remote remove origin

# Delete all local branches except main
git branch | grep -v main | xargs git branch -D

# Garbage Collect everything now
git reflog expire --expire=now --all
git gc --aggressive --prune=now
4
nmw01223 On

Thanks for all the comments, particularly mkreiger1.

That led me to a post re git clone SRC DEST --depth=nn. That did it, saved about 90% of the space.

Since it is a local clone, necessary to prefix SRC with file:// or depth gets ignored.

Also noted it has a .github folder, as opposed to .git. Not sure why, but all relevant history seems present.