Combining positive and negative lookahead in python

1.2k Views Asked by At

I'm trying to extract tokens that satisfy many conditions out of which, I'm using lookahead to implement the following two conditions:

  1. The tokens must be either numeric/alphanumeric (i.e, they must have at least one digit). They can contain few special characters like - '-','/','\','.','_' etc.,

I want to match strings like: 165271, agya678, yah@123, kj*12-

  1. The tokens can't have consecutive special characters like: ajh12-&

I don't want to match strings like: ajh12-&, 671%&i^

I'm using a positive lookahead for the first condition: (?=\w*\d\w*) and a negative lookahead for the second condition: (?!=[\_\.\:\;\-\\\/\@\+]{2})

I'm not sure how to combine these two look-ahead conditions.

Any suggestions would be helpful. Thanks in advance.

Edit 1 :

I would like to extract complete tokens that are part of a larger string too (i.e., They may be present in middle of the string).

I would like to match all the tokens in the string: 165271 agya678 yah@123 kj*12-

and none of the tokens (not even a part of a token) in the string: ajh12-& 671%&i^

In order to force the regex to consider the whole string I've also used \b in the above regexs : (?=\b\w*\d\w*\b) and (?!=\b[\_\.\:\;\-\\\/\@\+]{2}\b)

2

There are 2 best solutions below

8
The fourth bird On BEST ANSWER

You can use

^(?!=.*[_.:;\-\\\/@+*]{2})(?=[^\d\n]*\d)[\w.:;\-\\\/@+*]+$

Regex demo

The negative lookahead (?=[^\d\n]*\d) matches any char except a digit or a newline use a negated character class, and then match a digit.

Note that you also have to add * and that most characters don't have to be escaped in the character class.

Using contrast, you could also turn the first .* into a negated character class to prevent some backtracking

^(?!=[^_.:;\-\\\/@+*\n][_.:;\-\\\/@+*]{2})(?=[^\d\n]*\d)[\w.:;\-\\\/@+*]+$

Edit

Without the anchors, you can use whitespace boundaries to the left (?<!\S) and to the right (?!\S)

(?<!\S)(?!=\S*[_.:;\-\\\/@+*]{2})(?=[^\d\s]*\d)[\w.:;\-\\\/@+*]+(?!\S)

Regex demo

2
Niel Godfrey Pablo Ponciano On

You can use multiple look ahead assertions to only capture strings that

  1. (?!.*(?:\W|_){2,}.*) - doesn't have consecutive special characters and
  2. (?=.*\d.*) - has at least 1 digit
^(?!.*(?:\W|_){2,}.*)(?=.*\d.*).*$