sort on pipe-delimited fields not behaving as expected

216 Views Asked by At

Consider this tiny text file:

ab
a

If we run it through sort(1), we get

a
ab

because of course a comes before ab.

But now consider this file:

ab|c
a|c

If we run it through sort -t'|', we again expect a to sort before ab, but it does not! (Try it under your version of Unix and see.)

What I think is happening here is that the -t option to sort is not really delimiting fields -- it may be changing the way (say) the start of field 2 would be found, but it's not changing the way field 1 ends. a|c sorts after ab|c because '|' comes after 'b' in ASCII. (It's as if the -t'|' argument is ignored, because you get the same result without it.)

So is this a bug in sort or in my understanding of it? And is there a way to sort on the first pipe-delimited field properly?

This question came up in my attempt to answer another SO question, Join Statement omitting entries .

1

There are 1 best solutions below

1
On BEST ANSWER

sort's default behavior is to treat everything from field 1 to the end of the line as the sort key. If you want it to sort on field 1 first, then field 2, you need to specify that explicitly.

$ sort -k1,1 -k2,2 -t'|' <<< $'ab|c\na|c'
a|c
ab|c