I have used the textProcessor and the prepDocuments functions from the stm package to clean a corpus.
Now I would like to convert the resulting object (list of indices plus vocabulary) into a standard document-term matrix (or quanteda document-feature matrix) so that I can apply topicmodels function LDA and compare the resulting topics with stm.
processed <- textProcessor(poliblog5k.docs,
metadata = poliblog5k.meta,
language = "en")
prepped <- prepDocuments(processed$documents,
processed$vocab,
processed$meta,
lower.thresh = 20)
LDA(processed)
LDA(prepped)
> Error in x != vector(typeof(x), 1L)
LDA(processed$documents)
LDA(prepped$documents)
> Error in !all.equal(x$v, as.integer(x$v))
I had the same problem. What I did is to transform the output from
prepDocumentsto a one-term-per-document-per-row format and then apply thecast_dfmfunction from the package{tidytext}.