I have a file with two columns. The first column has two underscore-separated numbers in it, and I want to sort the file lexicgraphically by this column. Now, if there is no second column, default sort does precisely what I want:
$ { echo 211_284; for ((i=2840;i<=2842;++i)); do echo 211_$i; done; echo 211_284; } | sort -k1
211_284
211_284
211_2840
211_2841
211_2842
But if I add a second column (which should be irrelevant to the sort!):
$ { echo 211_284 X; for ((i=2840;i<=2842;++i)); do echo 211_$i Y; done; echo 211_284 Z; } | sort -k1
211_2840 Y
211_2841 Y
211_2842 Y
211_284 X
211_284 Z
Or even adding a second column to just one of the rows:
$ { echo 211_284 X; for ((i=2840;i<=2842;++i)); do echo 211_$i; done; echo 211_284; } | sort -k1
211_284
211_2840
211_2841
211_2842
211_284 X
How do I sort on the first column, for real?
If you want to ignore anything other than the first column, use
sort -k1,1
; otherwise, you're specifying a start column but not an end column:Also, if you don't want your locale's collation order to impact the lexographic sort relationship between digits and spaces, set
LC_ALL=C
explicitly (or, more narrowly,LC_COLLATE=C
).