How do I get LSAfun to compare two rows of data in R?

46 Views Asked by At

I'm a neophyte in R.

I have a data frame that consists of about ~4000 conversations between two people. It's structured roughly like this:

Unique Identifier column1 column2
123456 blahblah blahblah
789412 blahblah blahblah

My goal is to get a similarity score for message 1 and message 2 of each row. So eventually the data frame would look like:

Unique Identifier column1 column2 cosine
123456 blahblah blahblah .562
789412 blahblah blahblah .264

Ultimately, I’d have ~4000 scores (one for each row). I’m assuming that costring is the correct command to run for this, but I keep getting errors. I'm assuming it's because R doesn't know that I want to compare column1 & 2 in each row.

1

There are 1 best solutions below

0
On

consider the stringdist package instead

library(stringdist)

test_data <- tibble( col1 = c("blaahblah", "hello this is a test"),
        col2 = c("blaahblah", "goodbye test"))

test_data %>%
  mutate(cosine =  1 -stringdist(col1, col2, method = "cosine"))

We take 1 - the cosine distance to get cosine similarity.

  col1                 col2         cosine
  <chr>                <chr>         <dbl>
1 blaahblah            blaahblah     1    
2 hello this is a test goodbye test  0.621