Regex "^[[:digit:]]$" not working as expected in AWK/GAWK

2.7k Views Asked by At

My GAWK version on RHEL is:

gawk-3.1.5-15.el5

I wanted to print a line if the first field of it has all digits (no special characters, even space to be considered)

Example:

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^[[:digit:]]$/)  print $0}'

Output:
Nothing

Expected Output:
123456789012345,3

What is going wrong here ? Does my AWK version not understand the GNU character classes ? Kindly help

3

There are 3 best solutions below

5
On BEST ANSWER

To match multiple digits in the the [[:digit:]] character class add a +, which means match one or more number of digits in $1.

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]+)$/)  print $0}'
123456789012345,3

which satisfies your requirement.

A more idiomatic way ( as suggested from the comments) would be to drop the print and involve the direct match on the line and print it,

echo "123456789012345,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
123456789012345,3

Some more examples which demonstrate the same,

echo "a1,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'

(and)

echo "aa,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'

do NOT produce any output a per the requirement.

Another POSIX compliant way to do strict length checking of digits can be achieved with something like below, where {3} denotes the match length.

echo "123,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'
123,3

(and)

echo "12,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'

does not produce any output.

If you are using a relatively newer version of bash shell, it supports a native regEx operator with the ~ using POSIX character classes as above, something like

#!/bin/bash

while IFS=',' read -r row1 row2
do
   [[ $row1 =~ ^([[:digit:]]+)$ ]] && printf "%s,%s\n" "$row1" "$row2"
done < file

For an input file say file

$ cat file
122,12
a1,22
aa,12

The script produces,

$ bash script.sh
122,12

Although this works, bash regEx can be slower a relatively straight-forward way using string manipulation would be something like

while IFS=',' read -r row1 row2
do
   [[ -z "${row1//[0-9]/}" ]] && printf "%s,%s\n" "$row1" "$row2"
done < file

The "${row1//[0-9]/}" strips all the digits from the row and the condition becomes true only if there are no other characters left in the variable.

0
On

Could you please try following and let me know if this helps.

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]*)$/)  print $0}'

EDIT: Above code could be reduced a bit to as follows too.

echo "123456789012345,3" | awk -F, '($1 ~ /^[[:digit:]]*$/)'
2
On

Here you are printing every line that matches a pattern. This is exactly the purpose of grep. Since @Inian brilliantly told you what was wrong with your code, let me propose an alternative grep-based answer that does exactly the same as the awk command (albeit much faster):

grep -E '^[[:digit:]]+,'