Linux - Read a record till to the end with awk

93 Views Asked by At

lets say i have this text from the logfile:

Jun 10 11:09:07 mylinux daemon.notice openvpn[3710]: TCPv4_CLIENT link remote: 1.22.333.444:1111

But i don't need the part between "mylinux" and the next colon: Thats the part i try to remove: daemon.notice openvpn[3710]

I "solved" it with awk, but thats not a good solution.

awk '{print $1,$2,$3,$4,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20;}' /var/log/messages

I just wrote many "$" to cover as many lines as possible, but this won't work if there are more lines then $ ofc.

I know i can check how many lines exist with "NF", but i don't know how to use this information.

Thats how records in a logfile look like:

Jun 10 11:47:29 FeketeLUA daemon.notice openvpn[3710]: LZO compression initialized
Jun 10 11:47:29 FeketeLUA daemon.notice openvpn[3710]: Attempting to establish TCP connection with 5.55.222.34:1122 [nonblock]
Jun 10 11:47:30 FeketeLUA daemon.notice openvpn[3710]: TCP connection established with 12.11.123.444:1111
3

There are 3 best solutions below

6
On BEST ANSWER

I think regexes are the way to go here. This is possible with awk but easier with Perl:

perl -pe 's/mylinux\K.*?(?=TCPv4_CLIENT)/ /' /var/log/messages

Where

  • Everything before \K has to be there but is not considered part of the match (that is later replaced)
  • .*? matches any string non-greedily (i.e., the shortest possible match is taken rather than the longest)
  • (?=TCPv4_CLIENT) is a lookahead term that matches an empty string if (and only if) it is followed by TCPv4_CLIENT)

So the regex will match the part between mylinux and the first TCPv4_CLIENT that comes after it and replace it with a space.

Update: It's actually easier for the changed question since the ending delimiter is part of the removed match and we don't need the lookahead term for it:

perl -pe 's/FeketeLUA\K.*?://' /var/log/messages

\K and .*? continue to work as described before.

3
On

I must be missing something because it sounds like all you need is:

$ sed -r 's/(mylinux)[^:]+:/\1/' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111

$ awk '{x="mylinux"; sub(x"[^:]+:",x)} 1' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111

If instead you wanted to remove between 2 points without mentioning "mylinux" for example then that'd just be:

$ sed -r 's/(([^ ]+ +){4})[^:]+: /\1/' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111

$ awk '{print gensub(/(([^ ]+ +){4})[^:]+: /,"\\1","")}' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111

That 2nd awk command used gawk for gensub() - with other awks you'd use match()+substr().

4
On

Gnu awk way

awk 'match($0,/(.*mylinux).*(TCPv4_CLIENT.*)/,a){print a[1],a[2]}' file

Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111

Capture the bits you want in array a, then prints them.