we are using git for multiple projects, that are using hundreds of submodules. Most projects are using the same submodules, but when cloning them on the local disk, every project receives a full clone of each used submodule.
This results in an high network transfer and harddisk space usage.
Is there a way to define reference repositories for all those git repos?
As the URL's of the submodules can be anything, maybe a SHA1 hash of this URL could serve as a folder name
I'd think about such an command:
git clone --reference-if-able d:\GitRefRepos\"sha1(<URL>)" --recursive <URL>
or better as config
git config use-reference-if-able.folder d:\GitRefRepos\
git config use-reference-if-able.url2folder SHA1
git clone --recursive <URL>
I'd like to see from this a reduction of the harddisk space and network transfer time, by using the same submodule reference repo for all projects.
Yes, this is doable and is easier than the method you are proposing.
There are some custom scripts you can find that aid in the process of setting up and managing a git "cache", but the main point is that you can set up a single folder (if desired) as a reference bare repo. It can hold the "objects" of any number of repos.
You would then
--referencethat repo directory when cloning and it will lead to reduced network transfer time and harddisk space as you desire.For submodules, it looks like you can simply pass the same
--referenceoption togit submodule update, which will then be passed to thegit clonecommand for the submodule.