Is there any other way to find the similarity metric between the records with low overhead and high accuracy (other than Jaro-Winkler Algorithm)?

628 Views Asked by Ranjith Udayakumar At 27 July 2025 at 20:01

I am trying to achieve similarity metric between strings with Jaro Winkler Algorithm in python, I am using anaconda environment and deployed it on Alibaba Cloud ECS Instance.

The sample code I am using to find similarity:

from pyjarowinkler import distance
print ("Average Score ---->", distance.get_jaro_distance("hello", "haloa"))

Average Score ---->0.76

When I process 600k records it takes more than 20 mins. It is very slow to process large number of records. Is there any other way to find the similarity metric between the records with low overhead and high accuracy?

Original Q&A

There are 1 best solutions below

Ashly Taylor On 28 November 2018 at 17:06 BEST ANSWER

Jaro Winkler Distance which indicates the similarity score between two Strings. The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.

The original implementation is based on the Jaro Winkler Similarity Algorithm article that can be found on Wikipedia. This Python version of the original implementation is based on the Apache StringUtils library.

Unittest similar to what you will find in the StringUtils library were used to validate implementation.

>>> from pyjarowinkler import distance
>>> # Scaling is 0.1 by default
>>> print distance.get_jaro_distance("hello", "haloa", winkler=True, scaling=0.1)
0.76
>>> print distance.get_jaro_distance("hello", "haloa", winkler=False, scaling=0.1)
0.733333333333

Get more detailed information from this link

I hope this will help you regarding your query.

Is there any other way to find the similarity metric between the records with low overhead and high accuracy (other than Jaro-Winkler Algorithm)?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in ALIBABA-CLOUD

Related Questions in ALIBABA-CLOUD-ECS

Trending Questions

Popular # Hahtags

Popular Questions