Put temporary files for $ git clone ... # not in CWD?

46 Views Asked by At

We're using $ git clone -b releases/gcc-13.2.0 git://gcc.gnu.org/git/gcc.git # on a Samba filesystem, and noticed that git creates lots of temporary files in CWD (which causes performance problems, as data for this one $ git clone ... # are transferred over the wire multiple times...).

Is there any option to store these files in $TMPDIR (e.g. /tmp, tmpfs on Linux/Solaris/Illumos) local filesystem backed by RAM) instead ?

1

There are 1 best solutions below

0
bk2204 On

In general, Git will create lots of temporary files in order to support atomic behaviour and locking over a variety of file systems, including ones which don't support POSIX locking (like SMB/CIFS). If Git did not do this, it would be vulnerable to race conditions which could corrupt the repository.

While Git doesn't always write files in the working tree this way, it does write virtually every other file to a temporary file, which it then atomically renames into place. This is true for objects, refs, and pretty much everything else in .git.

(I should point out that while I'm not familiar with the details of SMB/CIFS, it would be a terrible protocol design mistake to send the data over the connection again to perform a rename, so hopefully the cost of the rename on your system is just another protocol message, which might be more acceptable.)

These temporary files cannot, in general, be in $TMPDIR because they need not be on the same file system, and it's not possible to rename them across file systems (Unix will return EXDEV). Even though mv works in that case, it essentially makes a copy and then deletes the old file, and that would not be atomic and therefore would open the possibility of corruption. Git doesn't permit the contents of .git to cross file systems, so it's always safe to perform those renames there.

Git does use $TMPDIR for files that may be invoked by diff programs and such that do not need to end up in the repository. This is the right decision, since it allows the use of all the benefits that $TMPDIR normally offers.

If you want to have fewer temporary files created, you can move the .git directory elsewhere with --separate-git-dir. This will create what's called a gitlink, which is a .git file (instead of a directory) that points to the correct location. However, this destination is absolute, so you should be aware that the actual .git directory should not be moved, even by mounting the path to a separate location. You must also not use the working tree if the .git directory is not mounted, because even things like your shell can invoke things like git status and Git will be very unhappy if it cannot find its data.

In general, my recommendation is not to use SMB or CIFS for Git data because Git expects something resembling POSIX functionality from its file systems (for example, a read occurring after a write must return the written data), very especially so on Unix. Even some NFS servers are known to be broken, since Git expects to be able to create a file with 0400 permissions while opening it read-write, and some NFS servers don't handle that correctly. Of course, your mileage may vary.