FFmpeg is considered by many to be the work of Fabrice Bellard, and maybe even his magnum opus, but since he stopped contributing to the project (under the pseudonym Gérard Lantau) in 2004, I wondered how much of it can actually still be said to be his. By comparison, Linus Torvalds' Wikipedia page states:
As of 2006, approximately 2% of the Linux kernel was written by Torvalds himself.[28] Because thousands have contributed to it, his percentage is still one of the largest. However, he said in 2012 that his own personal contribution is now mostly merging code written by others, with little programming.
This despite the fact that Torvalds is still an active contributor to the Linux kernel, whereas Bellard hasn't been an active contributor to FFmpeg for almost two decades.
FFmpeg being an open-source project tracked with Git, it seems like the question should be technically and objectively answerable, but as someone who hates mailing lists and the generally archaic ways that big open-source projects like to do things, I wouldn't know where to start in doing so.
Just how much of the modern FFmpeg codebase is Fabrice Bellard actually responsible for, in comparison to the other FFmpeg devs?
TL;DR
Using git blame, you can conclude that Bellard is the person who last touched 8851 of the 1942819 lines in the code base, or 0.46% of them.
Details
With some 8000 files in the repo containing a total of nearly 2 million lines, running
git blameon each file will take a long time, but it would let you see how many lines were still in the repo that Bellard/Lantau had contributed. As @Gyan says, though, this will only report lines that are exactly as he wrote them, any change in whitespace or style will be attributed to the person who made those trivial changes.That being said, here's the loop:
That loop will take a long time to run (it took about 5 hours on my computer), but eventually you'll be able to extract the author from each line with something like this:
that's based on parsing lines from the blame output that look like this:
my crude parser is not perfect, but it's enough to get statistics out of a crude tool like blame.
Let's count lines by authors, now:
shows the list of contributors to the code base, ranked from high to low by the number of lines last touched by that contributor according to the commit logs. Here's the top 50 list:
In this list, you can find Bellard in position 38, with 8851 lines, or 0.46% of the 1942819 lines
wc -l blame-authorsays were analyzed.Methodological limitations
I should have removed
tests/refandtests/reference.pnmfrom my processing, since those are a lot of binary files, but without them there are still 1.8M lines, so the answer remain around .4 to .5%.Even better, I should have identified and filtered out all binary files. My
blame-authorfile has some binary lines due to them. Again, I believe it's a minor error, but it's there nonetheless.The four
COPYING.*GPL*files are included, but were obviously not written by whoever committed them. That's only 1680 lines, but credit is given to committing something, not actually writing it.git blameis a crude tool. 492 of those lines are attributed to Bellard himself, so leaving them out would reduce the estimate of his surviving contribution to about 0.42% of the code base.git blamecan accept a--ignore-revs-file FILENAMEoption that lists commits that only apply style changes. E.g., I use that in my repos to exclude the commits where I am just reformatting Python code with black, or you could use it to ignore commits that only change CRLF to LF line endings in some files. I did not try to find style-only commits in FFmpeg but one could improve the significance of these statistics by doing so.I didn't see the name Lantau anywhere, so I assume all of Bellard's contributions are under that name.
For future reference, should anyone actually care, my analysis is based on this commit, which is the HEAD of the master branch at the moment of writing: