I'm doing a fuzzy match test between an input string and some previously entered strings. The test is performed live while typing.
I already have a shockingly accurate algorithm in place called StrikeAMatch, which has been translated into many languages. The fastest Ruby implementation I've found is amatch. However, it is incompatible with my JRuby environment because it crunches data in a C extension that requires the C interpreter for Ruby (MRI). It's pretty fast though:
a = "Lorem ipsum dolor"
b = "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nam cursus. Morbi ut mi. Nullam enim leo, egestas id, condimentum at, laoreet mattis, massa. Sed eleifend nonummy diam. Praesent mauris ante,"
puts Benchmark.measure {
10000.times { a.pair_distance_similar(b) }
}
# => 0.130000 0.000000 0.130000 ( 0.146347)
I hope I can avoid setting up an alternative environment. An alternative approach could be to try and port the original Java code as suggested in the JRuby Wiki. Not sure how to do that though.
Any ideas about how to approach this?
The algorithm is easy to implement. For example, here's a quick implementation I wrote in Java:
You can even pad the first and last characters, to extend it to one-character or zero-character terms:
To validate it, here are the examples proposed in the link you provided:
... and here's the padded version:
If you want to use it directly from JRuby, you need only to add
StrikeAMatch.class
to your$CLASSPATH
, and within your scriptrequire 'java'
followed byjava_import 'StrikeAMatch
:To invoke it: