R: Remove words systematically from corpus after processing topic model

199 Views Asked by Jonnytheriver At 06 March 2023 at 10:41

I am doing topic modeling with the topicmodels-package and a corpus consisting of three documents.

model <- LDA(dat_dtm, method = "VEM", k = 3, control = list(alpha = 0.1))

Output:

A LDA_VEM topic model with 3 topics.

After that, I use the terms-function to obtain the top 5 words of each model.

terms(model, 5)

Outuput with made up words:

topic 1	topic 2	topic 3
strong	poor	class
wealth	struggle	middle
money	homeless	money
power	money	sufficient
rich	wealth	wealth

As you can see, the words "money" and "wealth" appear in each topic, but they are not really meaningful for my analysis. So I thought it might be a good idea to remove these words from the whole corpora and conduct a new topic model without them. I tried to do this automatically by telling R that it should observe the top 20 words for each topic and remove all words from the corpora which are in each topic under the top 20. However, I only generated errors because I am not really familiar with the topicmodels-package. Obviously, you can just add these words to the stop word list manually, but maybe there is a more professional way to do it?

Thank you in advance!

Original Q&A

There are 1 best solutions below

Leonardo19 On 07 March 2023 at 10:26

I think the easiest way is to make a vector object of the top 20 words and add it to your stop word list.

You can use tidyverse to specify these words for each topic.

library(tidyverse)

remove_words <- model %>% 
  tidy(matrix = "beta") %>% 
  group_by(topic) %>%
  slice_max(beta, n = 20) %>% 
  pull(term)

Now you have a vector object called remove_words, which should be added into your stop word list before conducting a new topic model.

Hope this helps!

R: Remove words systematically from corpus after processing topic model

There are 1 best solutions below

Related Questions in R

Related Questions in TOPIC-MODELING

Related Questions in TOPICMODELS

Trending Questions

Popular # Hahtags

Popular Questions