[1] FALSE agrepl("cool", "cold") #> [1] TRUE" /> [1] FALSE agrepl("cool", "cold") #> [1] TRUE" /> [1] FALSE agrepl("cool", "cold") #> [1] TRUE"/>

What is the logic of approximate string matching?

100 Views Asked by At

Does anybody know what is the reason for the following example:

agrepl("cold", "cool")
#> [1] FALSE
agrepl("cool", "cold")
#> [1] TRUE
1

There are 1 best solutions below

0
Max Teflon On BEST ANSWER

Since the max distance defaults to:

If cost is not given, all defaults to 10%, and the other transformation number bounds default to all. The component names can be abbreviated.

And:

Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction)

The default maximum amount of transformations for a pattern of length 4 is 1. The cool-pattern matches the col in the beginning of the cold using only 1 deletion. Changing the cold to match cool would take at least two transformations (two subsitutions or one deletion and one insertion).

These examples might explain it a bit further:

agrepl("cold", "cool",max.distance = 1) # two changes necessary
#> [1] FALSE
agrepl("cold", "cool",max.distance = 2)
#> [1] TRUE
agrepl("cold", "coold") # just one addition necessary
#> [1] TRUE