I have a question on "augment" function from Silge and Robinson's "Text Mining with R: A Tidy Approach" textbook. Having run an LDA on a corpus, I am applying the "augment" to assign topics to each word.
I get the results, but am not sure what takes place "under the hood" behind "augment", i.e. how the topic for each word is being determined using the Bayesian framework. Is it just based on conditional probability formula, and estimated after LDA is fit using p(topic|word)=p(word|topic)*p(topic)/p(word)?
I will appreciate if someone could please provide statistical details on how "augment" does this. Could you also please provide references to papers where this is documented.
The tidytext package is open source and on GitHub so you can dig into the code for
augment()
for yourself. I'd suggest looking ataugment()
for LDA from the topicmodels packageaugment()
for the structural topic model from the stm packageTo learn more about these approaches, there is an excellent paper/vignette on the structural topic model, and I like the Wikipedia article for LDA.