I have a shallow cloned git repository that is over 1 GB. I use sparse checkout for the files/dirs needed.
How can I reduce the repository clone to just the sparse checkout files/dirs?
Initially I was able to limit the cloned repository to only the sparse checkout by disabling checkout when cloning. Then setting up sparse checkout before doing the initial checkout. This limited the repository to only about 200 MB. Much more manageable. However updating remote branch info at some point in the future causes the rest of the files and dirs to be included in the repository clone. Sending the repo clone size back to over 1 GB and I don't know how to just the sparse checkout files and dirs.
In short what I want is a shallow AND sparse repository clone. Not just sparse checkout of a shallow repo clone. The full repo is a waste of space and performance for certain tasks suffers.
Hope someone can share a solution. Thanks.
Shallow and sparse means "partial" or "narrow".
A partial clone (or "narrow clone") is in theory possible, and was implemented first in Dec 2017 with Git 2.16, as seen here.
But:
That is further optimized in Git 2.20 (Q4 2018), since in a partial clone that will lazily be hydrated from the originating repository, we generally want to avoid "does this object exist (locally)?" on objects that we deliberately omitted when we created the (partial/sparse) clone.
The cache-tree codepath (which is used to write a tree object out of the index) however insisted that the object exists, even for paths that are outside of the partial checkout area.
The code has been updated to avoid such a check.
See commit 2f215ff (09 Oct 2018) by Jonathan Tan (
jhowtan
).(Merged by Junio C Hamano --
gitster
-- in commit a08b1d6, 19 Oct 2018)With Git 2.24 (Q4 2019), the
cache-tree
code has been taught to be less aggressive in attempting to see if a tree object it computed already exists in the repository.See commit f981ec1 (03 Sep 2019) by Jonathan Tan (
jhowtan
).(Merged by Junio C Hamano --
gitster
-- in commit ae203ba, 07 Oct 2019)With Git 2.25 (Q1 2020), "
git fetch
" codepath had a big "do not lazily fetch missing objects when I ask if something exists" switch.This has been corrected by marking the "does this thing exist?" calls with "if not please do not lazily fetch it" flag.
See commit 603960b, commit e362fad (13 Nov 2019), and commit 6462d5e (05 Nov 2019) by Jonathan Tan (
jhowtan
).(Merged by Junio C Hamano --
gitster
-- in commit fce9e83, 01 Dec 2019)And:
See more with "Bring your monorepo down to size with sparse-checkout" from Derrick Stolee
Before Git 2.25.1 (Feb. 2020),
has_object_file()
said "no
" given an object registered to the system viapretend_object_file()
, making it inconsistent withread_object_file()
, causing lazy fetch to attempt fetching an empty tree from promisor remotes.See discussion.
See commit 9c8a294 (02 Jan 2020) by Jonathan Tan (
jhowtan
).(Merged by Junio C Hamano --
gitster
-- in commit e26bd14, 22 Jan 2020)Git 2.25.1 will also warn programmers about
pretend_object_file()
that allows the code to tentatively use in-core objects.See commit 60440d7 (04 Jan 2020) by Jonathan Nieder (
artagnon
).(Merged by Junio C Hamano --
gitster
-- in commit b486d2e, 12 Feb 2020)So the comment is now:
Git 2.25.1 (Feb. 2020) includes a Futureproofing for making sure a test do not depend on the current implementation detail.
See commit b54128b (13 Jan 2020) by Jonathan Tan (
jhowtan
).(Merged by Junio C Hamano --
gitster
-- in commit 3f7553a, 12 Feb 2020)Git 2.25.2 (March 2020) fixes a bug revealed by a recent change to make the protocol v2 the default.
See commit 3e96c66, commit d0badf8 (21 Feb 2020) by Derrick Stolee (
derrickstolee
).(Merged by Junio C Hamano --
gitster
-- in commit 444cff6, 02 Mar 2020)Fix:
The logic to auto-follow tags by "
git clone --single-branch
" was not careful to avoid lazy-fetching unnecessary tags, which has been corrected with Git 2.27 (Q2 2020),See commit 167a575 (01 Apr 2020) by Jeff King (
peff
).(Merged by Junio C Hamano --
gitster
-- in commit 3ea2b46, 22 Apr 2020)Before Git 2.27 (Q2 2020), serving a "
git fetch
" client over "git://
" and "ssh://
" protocols using the on-wire protocol version 2 was buggy on the server end when the client needs to make a follow-up request to e.g. auto-follow tags.See commit 08450ef (08 May 2020) by Christian Couder (
chriscool
).(Merged by Junio C Hamano --
gitster
-- in commit a012588, 13 May 2020)With Git 2.29 (Q4 2020), the
pretend-object
mechanism checks if the given object already exists in the object store before deciding to keep the data in-core, but the check would have triggered lazy fetching of such an object from a promissor remote.See commit a64d2aa (21 Jul 2020) by Jonathan Tan (
jhowtan
).(Merged by Junio C Hamano --
gitster
-- in commit 5b137e8, 04 Aug 2020)With Git 2.37 (Q3 2022), "
git mktree --missing
"(man) lazily fetched objects that are missing from the local object store, which was totally unnecessary for the purpose of creating the tree object(s) from its input.See commit 817b0f6 (21 Jun 2022) by Richard Oliver (
RichardBray
).(Merged by Junio C Hamano --
gitster
-- in commit 6fccbda, 13 Jul 2022)