I have a dataframe from tidytext that contains the individual words from some survey free-response comments. It has just shy of 500,000 rows. Being free-response data, it is riddled with typos. Using textclean::replace_misspellings took care of almost 13,000 misspelled words, but there were still ~700 unique misspellings that I manually identified.
I now have a second table with two columns, the first is the misspelling and the second is the correction.
For instance
allComments <- data.frame("Number" = 1:5, "Word" = c("organization","orginization", "oragnization", "help", "hlp"))
misspellings <- data.frame("Wrong" = c("orginization", "oragnization", "hlp"), "Right" = c("organization", "organization", "help"))
How can I replace all the values of allComments$word that match misspellings$wrong with misspellings$right?
I feel like this is probably pretty basic and my R ignorance is showing....
You can use
matchto find the index for words fromallComments$Wordinmisspellings$Wrongand then use this index to subset them.In case the right word is not already in
allComments$Wordcast it to acharacter: