Need grep/awk/gawk to return whole section despite of break lines

172 Views Asked by At

I have the following problem... I have a file, similar to this one:

2018-04-25: line1
2018-04-25: line2
        this is another line
        I'm a line
2018-04-25: line3
2018-04-25: line4

If I run: grep 'this' test.log the result will be:

    this is another line

but I need the result to be:

2018-04-25: line2
        this is another line
        I'm a line

because 'this is another line' is actually part of the same entry the only problem is that we have a break line there and I need my grep to ignore this break line.

  • grep -C 1 'this' test.log
  • grep -B 1 'this' test.log

are not really an option because I might have more lines/break lines between the start of the entry and the end.

7

There are 7 best solutions below

5
glenn jackman On BEST ANSWER

Here's one way using GNU awk: the date at the start of the line is the record separator. For the record containing the pattern, print the previous record separator and the current record.

gawk -v RS='(^|\n)[0-9-]{10}' '
    /this/ {sub(/^\n/, "", prev_RT); print prev_RT $0} 
    {prev_RT = RT}
' file

Or, more straightforward

awk '
    function printif() {if (record ~ /this/) print record}
    /^[0-9-]{10}/ {printif(); record = ""} 
    {record = (record ? record "\n" : "") $0} 
    END {printif()}
' file
1
nbari On

If this is the input:

2018-04-25: line1
2018-04-25: line2
        this is another line
        I'm a line
2018-04-25: line3
2018-04-25: line4

You could use: grep -A2 line2 file.log, and it will return:

2018-04-25: line2
        this is another line
        I'm a line

The -A stands for after-context, from the man:

-A num, --after-context=num
         Print num lines of trailing context after each match. 

Or you could use a mix of -B and -A if using this as the pattern, for example:

grep -B1 -A1 this file.log
4
Sundeep On

For given sample, this would work

$ gawk -v ORS= -v RS='2018-' '/this/{print RS $0}' ip.txt
2018-04-25: line2
        this is another line
        I'm a line
  • -v ORS= clear output record separator
  • -v RS='2018-' set 2018- as input record separator (assuming year is same for all records)
  • /this/{print RS $0} if record contains this, print the record separator and record content
0
hek2mgl On

Another, multiline, awk version:

#!/usr/bin/awk -f    

# When the line is starting with the time string
# a new record is starting...
/^[[:digit:]]{4}(-[[:digit:]]{2}){2}/ {
    # Check if the (b)uffer matches /this/
    if(b~/this/)
       # ... and print it in that case
       print b

    # Empty the buffer in any case
    b="" 
}

# Append each line to the buffer
{b=b""ORS""$0}

It should work with any version of awk.

0
kvantour On

Just for completion, we can do this also with sed in a more cryptic way :

 sed -n '/[-0-9]\{10\}:/{x;/this/p;d};H;${x;/this/p}' <file>

or shorter:

 sed -n '/[-0-9]\{10\}:/ba;H;$!b;:a;x;/this/p' <file>

To understand this you need to know that sed has two memories. The pattern space is where you do all operations on and the hold space is a long term memory. The idea is to build the record in the hold space by appending each line with H. However, if a line of the file (i.e. the pattern space) contains a date, check what is in the hold space and print if needed. The swapping of both spaces is done with x.

Step by step:

sed -n '                       # -n suppress automatic printing of pattern space
        /[-0-9]\{10\}:/ba;     # did we find a date? if so goto label 'a'
        H;                     # append the line to the hold space
        $!b;                   # did we reach EOF? if not, go to the beginning
        :a;                    # create label 'a'
        x;/this/p              # you found a date or hit the EOF
                               # swap the patterns with 'x'
                               # check if it contains /this/
                               # if so print
        ' <file>
1
James Brown On

grep space-starting lines with -B 1:

$ grep -B 1 "^ " file
2018-04-25: line2
        this is another line
        I'm a line

If the space is not enough: grep -B 1 -v "^[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}:" file

0
Claes Wikner On

From regex match to another regex:

awk '/line2/{f=1} f;/I\47m a line/{f=0}' file 

2018-04-25: line2
        this is another line
        I'm a line