Unix sort emits lines outside of expected order only when second column exists

50 Views Asked by At

I have a file with two columns. The first column has two underscore-separated numbers in it, and I want to sort the file lexicgraphically by this column. Now, if there is no second column, default sort does precisely what I want:

$ { echo 211_284; for ((i=2840;i<=2842;++i)); do echo 211_$i; done; echo 211_284; } | sort -k1
211_284
211_284
211_2840
211_2841
211_2842

But if I add a second column (which should be irrelevant to the sort!):

$ { echo 211_284 X; for ((i=2840;i<=2842;++i)); do echo 211_$i Y; done; echo 211_284 Z; } | sort -k1
211_2840 Y
211_2841 Y
211_2842 Y
211_284 X
211_284 Z

Or even adding a second column to just one of the rows:

$ { echo 211_284 X; for ((i=2840;i<=2842;++i)); do echo 211_$i; done; echo 211_284; } | sort -k1
211_284
211_2840
211_2841
211_2842
211_284 X

How do I sort on the first column, for real?

1

There are 1 best solutions below

11
On

If you want to ignore anything other than the first column, use sort -k1,1; otherwise, you're specifying a start column but not an end column:

Also, if you don't want your locale's collation order to impact the lexographic sort relationship between digits and spaces, set LC_ALL=C explicitly (or, more narrowly, LC_COLLATE=C).

$ { echo 211_284 X; for ((i=2840;i<=2842;++i)); do echo 211_$i Y; done; echo 211_284 Z; } \
>   | LC_ALL=C sort -k1,1
211_284 X
211_284 Z
211_2840 Y
211_2841 Y
211_2842 Y