Training a LDA model with gensim from some external tf-idf matrix and term list

574 Views Asked by At

I have a tf-idf matrix already, with rows for terms and columns for documents. Now I want to train a LDA model with the given terms-documents matrix. The first step seems to be using gensim.matutils.Dense2Corpus to convert the matrix into the corpus format. But how to construct the id2word parameter? I have the list of the terms (#terms==#rows) but I don't know the format of the dictionary so I cannot construct the dictionary from functions like gensim.corpora.Dictionary.load_from_text. Any suggestions? Thank you.

1

There are 1 best solutions below

0
On

id2word must map each id (integer) to term (string).

In other words, it must support id2word[123] == 'koala'.

A plain Python dict is the easiest option.