Details behind "augment" when applied to topic modeling

164 Views Asked by At

I have a question on "augment" function from Silge and Robinson's "Text Mining with R: A Tidy Approach" textbook. Having run an LDA on a corpus, I am applying the "augment" to assign topics to each word.

I get the results, but am not sure what takes place "under the hood" behind "augment", i.e. how the topic for each word is being determined using the Bayesian framework. Is it just based on conditional probability formula, and estimated after LDA is fit using p(topic|word)=p(word|topic)*p(topic)/p(word)?

I will appreciate if someone could please provide statistical details on how "augment" does this. Could you also please provide references to papers where this is documented.

1

There are 1 best solutions below

0
On

The tidytext package is open source and on GitHub so you can dig into the code for augment() for yourself. I'd suggest looking at

  • augment() for LDA from the topicmodels package
  • augment() for the structural topic model from the stm package

To learn more about these approaches, there is an excellent paper/vignette on the structural topic model, and I like the Wikipedia article for LDA.