Float comparison in awk and mawk

450 Views Asked by At

I cannot understand why the float number comparison does not work in mawk:

mawk '$3 > 10' file.txt
[...]
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_7_F   3196    3.68367
9_9_F   2278    2.37445
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775
[...]

While it does perfectly on awk like that:

awk '{if ($3 > 10) print $1}' file.txt

I'm obviously doing something wrong here, but I cannot understand what.

2

There are 2 best solutions below

6
James Brown On BEST ANSWER

It fails if the file has CRLF line terminators. Remove the \r first:

$ file foo
foo: ASCII text, with CRLF line terminators
$ mawk 'sub(/\r/,"") && ($3 > 10)'  foo
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775

Alternatively you could use dos2unix or such.

EDIT2: If you are using locale that has comma as decimal separator, it affects float comparisons in mawk.

In this case you can either:

1) set locale to

LANG="en_US.UTF-8"

or

2) change decimal separators to commas and pipe it to mawk:

mawk '$3 > 10' <(cat file.txt | sed -e "s/\./,/")
0
RARE Kpop Manifesto On

You don't need to set locale, but need to account for strange or errorneous input :

If the input has a dot, or any character than has a byte ordinance higher than ASCII "1" (which is a LOT of stuff) :

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  :5.333

this would completely fail to produce the correct result, since $3 is being compared as a string, where an ASCII "9" is larger than ASCII "1" :

mawk2 'sub("\r*",_)*(10<$3)'

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  9.
9_annua_M-merg  122663  :5.333

To rectify it, simply add + next to $3 :

mawk 'sub("\r*",_)*(10<+$3)'

If you don't care much for archaic gawk -P/-c/-t modes then it's even simpler :

mawk '10<+$3' RS='\r?\n'

Let ORS take care of the \r::CR on your behalf. By placing the ? at the RS regex, you can skip all the steps about using iconv or dos2unix or changing locale settings ::

  • RS—-->ORS would seamlessly handle it

This way the original input file remains intact, in case you need those CRs later for some reason.