I have a tibble that has to three columns:
- wine - Name of the wine
- wine_description - Words describing wine (punctuation has been stripped out)
- target - 0 or 1 variable 1 = Top Rated Wine, 0 = Not Top Rated Wine
What R package might I use if I were interested in identifying words that tend to be present with top-rated wine (the target variable = 1)
I came across Text Mining in R Text Mining with R, but this appears to be more about sentiment analysis which seems close to what I'm trying to achieve, but perhaps a bit off the mark. Any suggestions would be welcomed.
I am working under the assumption that once I've completed some basic analysis I will be able to incorporate that into a logistic regression.
A minimal working example would be nice. As far as I can see, all you need is a package to turn your data into a document-feature matrix (dfm), using your wine_description variable as the text field. I like Quanteda for doing that.
Logistic regression with the dfm as predictors would then be one way to identify which words are used to describe top-rated wines.