Latent Dirichlet Allocation Implementation with Gensim

294 Views Asked by At

I am doing project about LDA topic modelling, i used gensim (python) to do that. I read some references and it said that to get the best model topic thera are two parameters we need to determine, the number of passes and the number of topic. Is that true? for the number of passes we will see at which point the passes are stable, for the number of topic we will see which topic that has the lowest value.

num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None 

And is it necessary to use all the parameters in gensim library?

1

There are 1 best solutions below

2
On

Good LDA models mostly depend on the number of topics. The more passes, the more accurate the topic model will be (and also the longer it will take to train).

Of course it is not necessary to use all the parameters. Most of the time you will just pass the required arguments. To find the optimal number of topics, you can get the c_v coherence values and find the highest coherence over a given grid. Generally coherence is a better metric than perplexity as it is more in line with human annotators.