How does BestMatch findall decide how many results to return?

51 Views Asked by At

I'm struggling to predict how many fuzzy matches findall() will return when using regex in Python with BESTMATCH enabled:

>>> regex.findall(r'(?b)(North\ West){i<=0,s<=2,d<=1}', "South west South West North West", regex.V1)
['North West']

Does not match South West at all

>>> regex.findall(r'(?b)(North\ West){i<=0,s<=2,d<=1}', "South west South West North West North west South West", regex.V1)
['North West', 'North west', 'South West']

Matches South West

I'm not clear if this is a bug or as intended?

1

There are 1 best solutions below

0
On BEST ANSWER

I think I have a partial explanation:

The behaviour appears to be:

  • Return all perfect matches (with no substitutions, insertions or deletions)
  • Return all imperfect matches AFTER the last perfect match

It appears (through testing), that BESTMATCH behaves like a normal search, except if it finds a perfect match, it throws away any preceding imperfect matches. Hence the behaviour that it returns a series of perfect matches, followed by zero or more imperfect matches.

Some examples:

>>> regex.findall(r'(?b)(abc){i<=0,s<=2,d<=1}', "abc abd abd aaa abc", regex.V1)
['abc', 'abc']
>>> regex.findall(r'(?b)(abc){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb", regex.V1)
['abc', 'abc', 'abb']
>>> regex.findall(r'(?b)(ab[cd]){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb", regex.V1)
['abc', 'abd', 'abd', 'abc', 'abb']
>>> regex.findall(r'(?b)(ab[cd]){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb abc", regex.V1)
['abc', 'abd', 'abd', 'abc', 'abc']