difflib.SequenceMatcher not returning unique ratio

780 Views Asked by At

I am trying to compare 2 street networks and when i run this code it returns a a ratio of .253529... i need it to compare each row to get a unique value so i can query out the streets that dont match. What can i do it get it to return unique ratio values per row?

# Set local variables
inFeatures = gp.GetParameterAsText(0)
fieldName = gp.GetParameterAsText(1)
fieldName1 = gp.GetParameterAsText(2)
fieldName2 = gp.GetParameterAsText(3)
expression = difflib.SequenceMatcher(None,fieldName1,fieldName2).ratio()

# Execute CalculateField arcpy.CalculateField_management(inFeatures, fieldName, expression, "PYTHON_9.3")

1

There are 1 best solutions below

0
On BEST ANSWER

If you know both files always have the exact same number of lines, a simple approach like this would work:

ratios = []

with open('fieldName1', 'r') as f1, open('fieldName2', 'r') as f2:
    for l1, l2 in zip(f1, f2):
        R = difflib.SequenceMatcher(None,l1,l2).ratio()
        ratios.append((l1, l2, R))

This will produce a list of tuples like this:

[("aa", "aa", 1), ("aa", "ab", 0.5), ...]

If your files are different sizes you'll need to find some way to match up the lines, or otherwise handle it