For several dfms, I have no problem converting them to stm/lda/topicmodels format. However, if I weight the dfms with dfm_tfidf() before converting, I get the following error:
Error in convert.dfm(users_dfm, to = "stm") : cannot convert a non-count dfm to a topic model format
Any idea why this might be? I've tried different weighting schemes for both term and document frequency (to try and make the weighted dfm a 'count' dfm), but I keep getting the error.
So, this works:
users_dfm <- dfm(users_tokens)
users_stm <- convert(users_dfm, to = "stm")
But this doesn't:
users_dfm <- dfm(users_tokens)
weighted_dfm <- dfm_tfidf(users_dfm)
users_stm <- convert(weighted_dfm, to = "stm")
Thanks!
This is because topic models require counts as inputs, because that is the nature of the assumed statistical distribution for the latent Dirichlet allocation model. tf-idf weighting of the dfm turns the matrix into non-integer values, which are not valid inputs for stm (or any other topic model).
So in short, don't weight your dfm before using it with a topic model.
You should also note that conversion of a dfm to the stm format is not strictly required, since
stm::stm()
can take a dfm object directly as an input.