""" pattern = r'''(?:^.{1,3}$|^.{4}(? """ pattern = r'''(?:^.{1,3}$|^.{4}(? """ pattern = r'''(?:^.{1,3}$|^.{4}(?

How does negative look up works in this regex

58 Views Asked by At
import re

text = """
This is a line.
Short
Long line
<!-- Comment line -->
"""

pattern = r'''(?:^.{1,3}$|^.{4}(?<!<!--))'''

matches = re.findall(pattern, text, flags=re.MULTILINE)

print(matches)

OUTPUT with pattern = r'''(?:^.{1,3}$|^.{4}(?<!<!--))''' :

['This', 'Shor', 'Long']

OUTPUT with pattern = r'''(?:^.{1,3}$|^.{3}(?<!<!--))''' :

['Thi', 'Sho', 'Lon', '<!-']

OUTPUT with pattern = r'''(?:^.{1,3}$|^.{5}(?<!<!--))''' :

['This ', 'Short', 'Long ', '<!-- ']

Any number other than 4 in .{4}(?<!<!--)) causes to display and match <!-- . How?

1

There are 1 best solutions below

2
Mark On BEST ANSWER

Here is the regex pattern broken down:

(
    ?: # match either
      ^.{1,3}$ # ...a line of 1 to 3 characters, any characters (e.g. "aaa")
      | # ...or
      ^.{4} # ...4 characters of any kind, from the start of a line
        (?<! # # provided those 4 characters are not
            <!-- # these ones
            )  
)

Now the basic pattern has been broken down, we can look at the variants:

r'''(?:^.{1,3}$|^.{3}(?<!<!--))'''

With this one, we can see that the second part of it doesn't work well- it's looking for three characters that don't match a four character string ("<!--", which doesn't make any sense. It's also why <!- is part of the output- Python is looking for <!--, not <!-

r'''(?:^.{1,3}$|^.{5}(?<!<!--))'''

The same applies for this as for the previous example, except in this case, it's looking for a 5 character string, not a 3 character one. Once again, <!-- is found because it is not <!--.

Hope this helps!