Getting full blame history all at once for a given file

150 Views Asked by At

I'm working on a program that analyzes git blame history over time, starting with the first commit of a given file all the way to HEAD, along a given branch.

Currently, the way I'm doing it is:

  1. Use git log --pretty='%H %ad' --date=unix <branch> to get a list of every commit on the branch.
  2. For each commit in that list, individually, use git blame --date=unix --minimal -l -e -w <commit> <filename> and parse the results.

The problem is that this takes a long time. Plus, I'm actually doing this for every file in a repo, over multiple repos. So worst case for a given repo I think it's something like O(number_of_files * number_of_commits). A lot of the time is taken up by spawning git processes. For a tiny repo with a few dozen files and a few hundred commits it takes almost 3 minutes (to run git about 16,000 times), and its already fully parallelized.

My question is, is there a way to get the complete blame history (e.g. if a line was changed multiple times in multiple commits) for every change to a given file (still one file at a time, though) in a single git command, so that I can reduce the amount of time this takes? I'd like to reduce it to O(number_of_files). This is my first optimization target, I just haven't been able to figure out if there's a way to do it yet.

I looked at the output from git blame --incremental but, unless I'm misreading (I didn't do a proper comparison so I might be wrong here), it still only gives the blames for the most recent changes, not every change at once.

Is it possible to do this and, if so, how?

0

There are 0 best solutions below