Regex for matching a word, unless the previous line ends with a word

389 Views Asked by At

I have a text which contains many sentences, separated by newlines and arbitrary whitespace:

Some thing.
  Some other text.
 Some line.
   Some additional text.
Some stuff.
    Some additional text.
Some additional text.

How do I match only those Some words, where the previous line doesn't end with thing or stuff?

For the example above, I would match these words:

Some thing.           
  Some other text.          <-- skip, previous line ends with "thing."
 [Some] line.
   [Some] additional text.  
[Some] stuff.
    Some additional text.   <-- skip, previous line ends with "stuff."
[Some] additional text.

I tried (?<!thing\.|stuff\.)[\r\n\s]+Some, but I don't know how to include the whitespace+newlines in the negative lookbehind? I've found some examples using \K to allow "variable length" matching, but I obviously don't understand how \K at all, since I wasn't able to match anything.

2

There are 2 best solutions below

0
On BEST ANSWER

You can use a 'sacrificial match' with a non-capturing group to match what you don't want which then allows matching what you do want in a capturing group:

/(?:^\s*Some.*(?:thing\.|stuff\.)\s*^\s*Some)|(^\s*Some)/m

Demo

Or, if you want the first and the fourth (as stated in comments, your example is inconsistent...)

/(?:(?:thing\.|stuff\.)\s*Some)|(^\s*Some)/m

Demo

Or, skip the first Some and include the fourth:

/(?:(?:thing\.|stuff\.)\s*Some)|((?<=\n)\s*Some)/m

Demo

This method works on most regex flavors.

A negative look behind is a problem in this case because a look behind needs to be fixed width. The \s* you describe is not fixed width.

0
On

You can use PCRE verbs (*SKIP)(*F) to fail a known matcha and use in alternation use your match:

(?:thing|stuff)\.\R\s*\w+(*SKIP)(*F)|\bSome\b

RegEx Demo

Here (?:thing|stuff)\.\R\s*.*(*SKIP)(*F) will skip & fail the match when previous line ends with thing. or stuff.. In the right hand side of alternation we will just get our match.