I'm trying to optimize this solution for a function that accepts 2 arguments: fullstring
and substring
. The function will return True
if the substring exists in the fullstring, and False
if it does not. There is one special wildcard that could be entered in the substring that denotes 0 or 1 of the previous symbol, and there can be more than one wildcard in the substring.
For example, "a*"
means ""
or "a"
The solution I have works fine but I'm trying to reduce the number of for loops (3) and optimize for time complexity. Using regex is not permitted. Is there a more pythonic way to do this?
Current Solution:
def complex_search(fullstring, substring):
patterns = []
if "*" in substring:
index = substring.index("*")
patterns.append(substring[:index-1] + substring[index+1:])
patterns.append(substring[:index] + substring[index+1:])
else:
patterns.append(substring)
def check(s1, s2):
for a, b in zip(s1, s2):
if a != b:
return False
return True
for pattern in patterns:
for i in range(len(fullstring) - len(pattern) + 1):
if check(fullstring[i:i+len(pattern)], pattern):
return True
return False
>> print(complex_search("dogandcats", "dogs*andcats"))
>> True
Approach
Create all alternatives for the substring based upon '" in substring (can have zero or more '' in substring)
See Function combs(...) below
Use Aho-Corasick to check if one of the substring patterns is in the string. Aho-Corasick is a very efficient algorithm for checking if one or more substrings appear in a string and formed as the basis of the original Unix command fgrep.
For illustrative purposes a Python version of Aho-Corasik is used below, but a C implementation (with Python wrapper) is available at pyahocorasick for higher performance.
See class Aho-Corasick below.
Code
Test