I would like to show only changes to the column headers of a csv file tracked by git. I use the code in this nice answer by Kirill Müller. It works almost perfectly except that it repeats the lines even if the commit didn't actually change the first line of the file.
Reproducible code
cd /tmp/
mkdir test
cd test/
git init
echo "bla,bla" > table.csv
git add table.csv
git commit -m "version bla"
echo "bla,bli" > table.csv
git commit -am "version bli"
echo "1,2" >> table.csv
git commit -am "Add data"
Issue
user:/tmp/test$ FILE=table.csv
user:/tmp/test$ LINE=1
user:/tmp/test$ git log --format=format:%H $FILE | xargs -L 1 git blame $FILE -L $LINE,$LINE
e4a89a75 (user 2022-08-10 16:45:04 +0200 1) bla,bli
e4a89a75 (user 2022-08-10 16:45:04 +0200 1) bla,bli
^58b4b88 (user 2022-08-10 16:44:16 +0200 1) bla,bla
The issue is that the last commit appears twice, eventhought the first line wasn't changed.
Expected output
e4a89a75 (user 2022-08-10 16:45:04 +0200 1) bla,bli
^58b4b88 (user 2022-08-10 16:44:16 +0200 1) bla,bla
What I tried
The log part of the instruction currently uses format:%H
user:/tmp/test$ git log --format=format:%H table.csv
c51873404aa45fb50fcbd6bd7ea06ab1e9f22071
e4a89a75e48623a1d2967996e6de3a250607e6a5
58b4b88800dd57cb1ca0476f1b9939781af28600
I tried adding the L1,1: argument to the log section but it formats the log differently so that the output cannot work anymore as an input to xargs
user:/tmp/test$ git log --format=format:%H -L1,1:table.csv
e4a89a75e48623a1d2967996e6de3a250607e6a5
diff --git a/table.csv b/table.csv
--- a/table.csv
+++ b/table.csv
@@ -1,1 +1,1 @@
-bla,bla
+bla,bli
58b4b88800dd57cb1ca0476f1b9939781af28600
diff --git a/table.csv b/table.csv
--- /dev/null
+++ b/table.csv
@@ -0,0 +1,1 @@
+bla,bla
Putting the log on one line may not be possible when using -L according to this answer:
"[...] git log --oneline -L 10,11:example.txt does work (it does however output the full patch)."
(First, big thanks for the reproducer—it was helpful—but one note: watch out, your quotes got mangled into "smart quotes" instead of plain double quotes. I fixed them.)
Based on the example, by "column headers" I take it you mean "line 1".
The basic problem starts here:
This finds, and prints the hash ID of, each occurrence of a commit that changes anything in the given file. (
FILEneeds to be set totable.csvhere.) This is not at all what you want! Its only function is to completely skip any commit where the file is entirely un-changed (which could be a useful function in real world examples, but not so much in your reproducer since every commit changes the file here.)(Side note: whenever it's possible, use
git rev-listinstead ofgit log. It's possible here. However, we're going to end up discardinggit log/git rev-listanyway. But see footnote / separate section below.)(Here,
LINEneeds to be set to 1.) The general idea here seems to be to rungit blameon one specific line (in this case line 1), which is fine as far as it goes, but isn't really want we want. If our left-side command,git log ... $FILE, had selected just the revisions we want, those would already be the revisions we want and we could just stop here.The real trick here is to run
git blamerepeatedly but only until the blame "runs out". Each invocation ofgit blameshould tell us who / which commit is "responsible for" (i.e., produced this version of) the given line, and that's exactly whatgit blamedoes. You give it a starting (ending?—Git works backwards, so we start at the end and work backwards) revision, and Git checks that version and the previous commit to see if the line in question changed in that version. If so, we're done: we print that version and the line. If not, we put the previous version in place and repeat. We do this until we run out of "previous versions", in which case we just print this version and stop.So
git blameis already doing what you want. The only problem is that it stops after it finds the "previous version" to print. So what we really want is to build a loop:The way to deal with this is to use
--porcelain(or--incrementalbut--porcelainseems most appropriate here). We know that-L 1,1(or-L $LINE,$LINE) is going to output a single line at the end. We want to collect the remaining lines. The output from--porcelainis described in the documentation: it's a series of lines with, in our case, the first and last being of interest, and the middle ones might be interesting, or might not, except thatpreviousorboundaryis always of interest.Shell parsing is kind of messy, so it's probably best to use some other language to handle the output from
git blame. For instance, we might use a small Python program. This one doesn't have many features but shows how to use--porcelainhere, and should be easy to modify. It has been very lightly tested (and run through black for formatting and mypy for type checking, but definitely needs better error handling. For instance, running it with a nonexistent pathname gets you afatalerror message, but then a Python traceback. I leave the cleanup to someone else, at this point.[Edit: this program badly needs a few checks for when Git doesn't run or
git blamedoes not find the file or line. In particularproc.stdout.readline()gets end-of-file and returns an empty string. Use with caution, fix it up, or don't use it at all.]Using
git logdirectlyThis may not have the output format you want, but note that
git logcan do just what you want without having to write a bunch of new code:(or leave out the
--onelineif you like). The-Ldirective takes two line numbers and a file name, or various other option formats, and does the same "find commits that modify the file" search that you were usinggit log table.csvfor in the first place, but restricts the output still further, to show only those files where the specified lines change.Add
--no-patchand an appropriate set of format directives, and you can get the commit hash IDs and whatever else you like, and then use some program to extract the lines from the specific files (e.g.,git cat-file -p rev:path | sed -n -e "$line{p;q;}").Note that
git logis what Git calls a porcelain command (vsgit rev-listorgit blame --porcelainacting as what Git calls a plumbing command). Porcelain commands generally obey Git configurations, such as the settings forcolor.ui,core.pager, andlog.pager, and settings likelog.decorate. This makes them hard to use from other programs, as it's hard to know whether something will be colorized (with ESC [ 31 m sequences for instance). Plumbing programs behave in a well-defined manner so that other programs can know exactly what input to expect. This is why we normally want to usegit rev-listrather thangit logwhen writing scripts, if we're doing something that both commands can do.