Is it possible to set the initial topic assignments for scikit-learn LDA?

114 Views Asked by ComplexGates At 17 August 2025 at 13:59

Instead of setting the topic_word_prior as a parameter, I would like to initialize the topics according to a pre-defined distribution over words. How would I set this initial topic distribution in sklearn's implementation? If it's not possible, is there a better implementation to consider?

Original Q&A

There are 1 best solutions below

Sara On 23 April 2019 at 23:35 BEST ANSWER

If you have a predefined distribution of words in a pre-trained model you can just pass a bow_corpus through that distribution as a function. Gensims LDA and LDAMallet can both be trained once then you can pass a new data set through for allocation without changing the topics.

Steps:

Import your data
Clean your data: nix punctuation, numbers, lemmatize, remove stop-words, and stem

Create a dictionary

dictionary = gensim.corpora.Dictionary(processed_docs[:])
dictionary.filter_extremes(no_below=15, no_above=0.5, keep_n=100000)

Define a bow corpus

bow_corpus = [dictionary.doc2bow(doc) for doc in processed_docs]

Train your model - skip if it's already trained

ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, 
            corpus=bow_corpus, num_topics=15, id2word=dictionary)

Import your new data and follow steps 1-4

Pass your new data through your model like this:

  ldamallet[bow_corpus_new[:len(bow_corpus_new)]]

Your new data is allocated now and you can put it in a CSV

Is it possible to set the initial topic assignments for scikit-learn LDA?

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in LDA

Related Questions in LATENT-SEMANTIC-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions