In my work I have with great results used approximate string matching algorithms such as Damerau–Levenshtein distance to make my code less vulnerable to spelling mistakes.
Now I have a need to match strings against simple regular expressions such TV Schedule for \d\d (Jan|Feb|Mar|...)
. This means that the string TV Schedule for 10 Jan
should return 0 while T Schedule for 10. Jan
should return 2.
This could be done by generating all strings in the regex (in this case 100x12) and find the best match, but that doesn't seam practical.
Do you have any ideas how to do this effectively?
I found the TRE library, which seems to be able to do exactly fuzzy matching of regular expressions. Example: http://hackerboss.com/approximate-regex-matching-in-python/ It only supports insertion, deletion and substitution though. No transposition. But I guess that works ok.
I tried the accompanying agrep tool with the regexp on the following file:
and got
Thanks a lot for all your suggestions.