awk dot in regex doesn't match space

101 Views Asked by At

I want to print everything after the first whitespace. E.g. given hello there I want to print hello there, given      what's up I want to print what's up.

I wrote this fully expecting it to work:

{ print $0 }
match($0, /[^[:space:]].*$/) {
    print $1
}

I thought the regex /[^[:space:]].*$/ would match the first non-space character, and then .*$ would match all of the characters after it.

But the regex only seems to capture up to the next whitespace:

$ echo hello there | awk -f after_indent.awk
hello there
hello
5

There are 5 best solutions below

4
minseong On

$1 is globally considered as the first field, it's not the match result. You have to use substr to get the match result:

{ print $0 }
match($0, /[^[:space:]].*$/) {
    print substr($0, RSTART, RLENGTH)
}
0
The fourth bird On

You might also remove any leading spaces from the row. Using * as the quantifier, sub will return 1 and print the whole row.

awk 'sub(/^[[:space:]]*/, "")' file

If both your example strings as in file, that will print:

hello there
what's up
0
glenn jackman On

With GNU awk, the match function can take a 3rd argument: an array that will contain the text matched in capturing parentheses:

gawk 'match($0, /([^[:blank:]].*)/, m) {print m[1]}' file
# ...............^..............^

m[1] contains the text from the 1st pair of parentheses.

0
Ed Morton On

FWIW I'd just use sed for this, e.g. given this input:

$ cat file
hello there
          what's up

and using any POSIX sed depending on whether or not you really want to duplicate lines in the output and how you want to handle lines that don't start with spaces:

$ sed 's/^[[:space:]]*//p' file
hello there
hello there
what's up
what's up

$ sed 's/^[[:space:]]\+//p' file
hello there
what's up
what's up

$ sed -n 's/^[[:space:]]*//p' file
hello there
what's up

$ sed -n 's/^[[:space:]]\+//p' file
what's up
0
RARE Kpop Manifesto On
echo "      what's up\n \t  hello there   \t " | 

— trimming just the head :

mawk ++NF FS='^[ \t-\r]+' OFS=

|what's up|
|hello there     |

— trimming head and tail :

gawk ++NF FS='^[ \t-\r]+|[ \t-\r]+$' OFS=

|what's up|
|hello there|

If you think \v, \f, and \r are impossible from your data, then it's a lot cleaner :

change [ \t-\r]+ — to — [ \t]+