Regex to non-greedily match across multiple lines up to a line that starts with a specific string

612 Views Asked by At

I am going to answer this myself, but this was giving me fits all day and although it is explained elsewhere, I thought I'd post it with my solution.

I came across a situation where I needed to replace some text spanning multiple lines. It wasn't hard to find threads about how to match across multiple lines. My case was a bit more difficult in that I needed to wildcard match any character across multiple lines, until stopping at the first non-indented closing bracket.

For demonstration purposes, I made a sample file that has the features that made this hard for me:

starting file:

cat << EOF > test.txt
server {
    abcdefg blablablabla
    pizza
    #blablablabla
    blablablabla {
    zazazazazaza
    }
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab

EOF

This was my desired output. Note that the bracket I am matching to is neither the first nor the last occurrence of the closing bracket. Its only distinguishing feature is that being the first } at the beginning of a line after the start of my match:

server {
    wxyz

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
} 

zabzazab

What I hoped would work. But slupring with 0777 strips out the markers for the beginning and end of a line, so it didn't work:

~#  perl -0777 -pe 's/(abcdefg(.*?)(^}.*$))/wxyz/gs' test.txt
server {
    abcdefg blablablabla
    pizza
    #blablablabla
    blablablabla {
    zazazazazaza
    }
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab

Matching the line start/end while also slupring was sticking point:

~# perl -0777 -pe 's/(abcdefg(.*?)(}))/wxyz/gs' test.txt
server {
    wxyz
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab


So is there a way I can get a regex to match between a string and the first instance of a { that appears at the beginning of a line? I'm open to using sed too, but I figured the non-greedy nature of my search would make perl a better choice.

3

There are 3 best solutions below

1
Polar Bear On BEST ANSWER

Perhaps any of following command will do it

perl -0777 -pe 's/abcdefg.*?(\nserver.*?)/wxyz\n$1/s' test.txt
perl -0777 -pe 's/abcdefg.*?server/wxyz\n\nserver/s' text.txt
perl -0777 -pe 's/abcdefg.*?}.*?}.*?}.*?\n/wxyz\n/s' test.txt
perl -0777 -pe 's/abcdefg(.*?}){3}.*?\n/wxyz\n/s' test.txt
perl -0777 -pe 's/abcdefg.*?\n}.*?\n/wxyz\n/s' test.txt

Output

server {
    wxyz

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab
6
Stonecraft On

It seems I need both the s and the m flags and the slurping:

~# perl -0777 -pe 's/(abcdefg(.+?)(\n}))/wxyz/sm' test.txt
server {
    wxyz #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

I still don't quite get why I needed both the m modifier AND slurping though. So if someone has a better answer, I'll mark that one instead of my own.

0
Cary Swoveland On

As I understand the question, you wish to match the portion of the string

server {
    abcdefg blablablabla
    pizza
    #blablablabla
    blablablabla {
    zazazazazaza
    }
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
... blablablabla
}
...

that begins "abcdefg" and ends at the end of the line, "} #comments that might or might not be here", provided "abcdefg" begins a line after indentation and that line is preceded by the line, "server {". You will then substitute another string for the matched text.

You can match the text to be replaced with the following regular expression:

/^server +\{\s+(abcdefg.+?\n\}.*?$)/sm

demo

The flag s allows .* to match newlines. The flag m instructs the parser to treat the anchors ^ and $ as the beginning and end of a line, respectively (presumably, as opposed to the beginning and end of the string).

We can write the regex in free-spacing mode to make it self-documenting.

/
^server +\{\s+    # match 'server {` followed by 1+
                  #  whitespace chars
(                 # begin capture group 1
  abcdefg         # match literal
  .+?             # match 1+ chars, lazily
  \n              # match a newline
  \}              # match '}'
  .*?             # match 1+ chars, lazily
  $               # match end of line
)                 # end capture group 1
/smx              # single-line, multiline and free-
                  # spacing regex definition modes