Computing edit distance using two simple columns from iris dataset

161 Views Asked by At

In the following code below, I want to compute similarity between two columns of text strings.To achieve this, I take first 10 rows of "Petal.Length" column from iris and assign it to a1 , and first 4 rows from "Sepal.Length" column from iris and assign it to a2. My objective is that each "a2" value should be compared to every a1 value using the formula in the last line such that I get a final vector percent_calc with 40 values.

library(stringdist)
library(RecordLinkage)

a1 = iris$Petal.Length[1:10] * 1000
a2 = iris$Sepal.Length[1:4]  * 1000
a1 = as.character(a1)
a2 = as.character(a2)

percent_calc = RecordLinkage::levenshteinSim(a2,a1)
1

There are 1 best solutions below

1
On

Get all combinations, then get distance:

a12 <- expand.grid(a1, a2, stringsAsFactors = FALSE)

percent_calc <- levenshteinSim(a12$Var1, a12$Var2)

percent_calc
# [1] 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
# [19] 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.75 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
# [37] 0.50 0.50 0.50 0.50