I just wanted to know if there's a simple way to search a string by coincidence with another one in Python. Or if anyone knows how it could be done.
To make myself clear I'll do an example.
text_sample = "baguette is a french word"
words_to_match = ("baguete","wrd")
letters_to_match = ('b','a','g','u','t','e','w','r','d') # With just one 'e'
coincidences = sum(text_sample.count(x) for x in letters_to_match)
# coincidences = 14 Current output
# coincidences = 10 Expected output
My current method breaks the words_to_match into single characters as in letters_to_match but then it is matched as follows: "baguette is a french word" (coincidences = 14).
But I want to obtain (coincidences = 10) where "baguette is a french word" were counted as coincidences. By checking the similarity between words_to_match and the words in text_sample.
How do I get my expected output?
It looks like you need the length of the longest common subsequence (LCS). See the algorithm in the Wikipedia article for computing it. You may also be able to find a C extension which computes it quickly. For example, this search has many results, including pylcs. After installation (
pip install pylcs):