I have a text which contains many sentences, separated by newlines and arbitrary whitespace:
Some thing.
Some other text.
Some line.
Some additional text.
Some stuff.
Some additional text.
Some additional text.
How do I match only those Some
words, where the previous line doesn't end with thing
or stuff
?
For the example above, I would match these words:
Some thing.
Some other text. <-- skip, previous line ends with "thing."
[Some] line.
[Some] additional text.
[Some] stuff.
Some additional text. <-- skip, previous line ends with "stuff."
[Some] additional text.
I tried (?<!thing\.|stuff\.)[\r\n\s]+Some
, but I don't know how to include the whitespace+newlines in the negative lookbehind? I've found some examples using \K
to allow "variable length" matching, but I obviously don't understand how \K
at all, since I wasn't able to match anything.
You can use a 'sacrificial match' with a non-capturing group to match what you don't want which then allows matching what you do want in a capturing group:
Demo
Or, if you want the first and the fourth (as stated in comments, your example is inconsistent...)
Demo
Or, skip the first
Some
and include the fourth:Demo
This method works on most regex flavors.
A negative look behind is a problem in this case because a look behind needs to be fixed width. The
\s*
you describe is not fixed width.