Strange lemmatization result in r, textstem package

932 Views Asked by At

I would like to get lemma "dive" from all possible forms of the word using textstem package in R.

But when I used textstem package in r, the basic form becomes a very strange result.

library(textstem)
words<-c("dived", "diving", "dive")

lemmatize_strings(words, dictionary = lexicon::hash_lemmas)

[1] "dive" "dive" "diva"

Here, I do not want "dive" as a result from a word "dive", instead I need to lemmatize the word "dive" into "dive", so it can be counted as the same word with other forms "dived", "diving". So it should be like this, below.

[1] "dive" "dive" "dive"

I found this link (stemDocment in tm package not working on past tense word), but it might not be useful in my case since I would have to process more than 80,000 reviews and I am highly likely to come across the same problem with different words.

I use lemmatize_stringsfor the dataset I have but it gives exactly the same result (though it's bit obvious). Can anyone please help me?

Thank you very much in advance!

0

There are 0 best solutions below