Error converting to stm after tf-idf weighting

357 Views Asked by At

For several dfms, I have no problem converting them to stm/lda/topicmodels format. However, if I weight the dfms with dfm_tfidf() before converting, I get the following error:

Error in convert.dfm(users_dfm, to = "stm") : cannot convert a non-count dfm to a topic model format

Any idea why this might be? I've tried different weighting schemes for both term and document frequency (to try and make the weighted dfm a 'count' dfm), but I keep getting the error.

So, this works:

users_dfm <- dfm(users_tokens) 
users_stm <- convert(users_dfm, to = "stm")

But this doesn't:

users_dfm <- dfm(users_tokens)
weighted_dfm <- dfm_tfidf(users_dfm)
users_stm <- convert(weighted_dfm, to = "stm")

Thanks!

1

There are 1 best solutions below

1
On

This is because topic models require counts as inputs, because that is the nature of the assumed statistical distribution for the latent Dirichlet allocation model. tf-idf weighting of the dfm turns the matrix into non-integer values, which are not valid inputs for stm (or any other topic model).

So in short, don't weight your dfm before using it with a topic model.

You should also note that conversion of a dfm to the stm format is not strictly required, since stm::stm() can take a dfm object directly as an input.