How to prevent git clone --filter=blob:none --sparse from downloading files on the root directory?

1.1k Views Asked by At

As explained at How do I clone a subdirectory only of a Git repository? the best way I've found so far to download all files in a Git subdirectory only is:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/cirosantilli/test-git-partial-clone-big-small
cd test-git-partial-clone-big-small
git sparse-checkout set small

which is my best attempt so far at downloading only the small/ directory.

However, as soon as I run:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/cirosantilli/test-git-partial-clone-big-small

any files (but not directories) present on the root directory are downloaded and appear in the repository, in the case of that test repo I get the unwanted file:

generate.sh

How to prevent that from happening, to obtain only the subdirectories that I'm interested in, without any root directory files?

I've checked on other repositories e.g. https://github.com/torvalds/linux , and having a large number of small files on toplevel does not slow down the download significantly (by downloading them one by one separately), so this would only be a problem if there are large files on toplevel.

Tested on Git 2.37.2, Ubuntu 22.10, February 2023.

1

There are 1 best solutions below

4
On

Do your clone --no-checkout aka -n, then set up your sparsity rules exactly as you want. To get really minimal clone traffic, don't use blob:none, use tree:0. Smoketest:

git clone -n --depth=1 --filter=tree:0 \
        https://github.com/cirosantilli/test-git-partial-clone-big-small
cd !$:t:r
git sparse-checkout set --no-cone '*/'
git checkout