Here it looks like the space after the 3
in both rows breaks the numerical sorting and lets the alphabetic sorting kick in, so that 11
<2
:
$ echo -e '3 2\n3 11' | sort -n
3 11
3 2
In man sort
, I read
-s, --stable stabilize sort by disabling last-resort comparison
which implies that without -s
a last-resort comparison is done (between ties, because -s
does not affect non-ties).
So the question is: how is this last-resort comparison accomplished? A reference to the source code would be welcome, if necessary to answer the question.
This answer Unix deduces, from experimentation, that the sorting of ties is lexicographic.
Does the standard/POSIX say anything about this?
Question: How is the last resort comparison done?
This is quickly answered in documentation of GNU coreutils:
This means that the final resort will sort according to the sorting order of LC_COLLATE, i.e. lexicographically (mostly).
POSIX, on the other hand adds a final ultra-last resort option which is stricter.
I am not certain if this is implemented in GNU sort, since it is not a requirement. Nonetheless, POSIX strongly recommends it (See Rationale last paragraph)
What does this mean in case of the OP?
There is an uncomfortable misunderstanding of the key-definitions. Assume you do something like
It is often understood that
sort
will first sort on field 1, then 2 and finally 3 using--option
. This is incorrect. It will use the key to be defined as the substring consisting of fields 1 till 3. And in case when two lines collate equally,sort
will perform the last-resort option (see earlier)Using GNU sort, you can see which substring is used for the sort. This is done with the
--debug
option. Here you see the difference between 3 simple cases:When you do a numeric sort (using
-n
or-g
),sort
will attempt to extract a number from thekey
(1234abc leads to 1234) and use that number for the sorting.As you notice in these two cases, even though the first field can be ordered lexicographically
3a < 3b
, it is ignored as we only pick the number from the key.