I am a little puzzled by two different answers returned by SequenceMatcher depending on the order of the arguments. Why is it so?
Example
SequenceMatcher is not commutative:
>>> from difflib import SequenceMatcher
>>> SequenceMatcher(None, "Ebojfm Mzpm", "Ebfo ef Mfpo").ratio()
0.6086956521739131
>>> SequenceMatcher(None, "Ebfo ef Mfpo", "Ebojfm Mzpm").ratio()
0.5217391304347826
SequenceMatcher.ratiointernally usesSequenceMatcher.get_matching_blocksto calculate the ratio, I will walk you through the steps to see how that happens:ratiointernally usesSequenceMatcher.get_matching_blocks's results, and sums the sizes of all matched sequences returned bySequenceMatcher.get_matching_blocks. This is the exact source code fromdifflib.py:The above line is critical, because the result of the above expression is used to compute the ratio. We'll see that shortly and how it impacts the calculation of the ratio.
As you can see, we have 7 and 6. These are simply the sums of the matched subsequences as returned by
get_matching_blocks. Why does this matter? Here's why, the ratio is computed in the following way, (this is fromdifflibsource code):lengthislen(a) + len(b)whereais the first sequence andbbeing the second sequence.Okay, enough talk, we need actions:
Similarly for
m2:Note: Not all
SequenceMatcher(None a,b).ratio() == SequenceMatcher(None b,a).ratio()areFalse, sometimes they can beTrue:In case you're wondering why, this is because
is the same for both
SequenceMatcher(None, "abcd", "bcde")andSequenceMatcher(None, "bcde", "abcd")which is 3.