Instead of setting the topic_word_prior as a parameter, I would like to initialize the topics according to a pre-defined distribution over words. How would I set this initial topic distribution in sklearn's implementation? If it's not possible, is there a better implementation to consider?
Is it possible to set the initial topic assignments for scikit-learn LDA?
137 Views Asked by ComplexGates At
1
There are 1 best solutions below
Related Questions in MACHINE-LEARNING
- How to cluster a set of strings?
- Enforcing that inputs sum to 1 and are contained in the unit interval in scikit-learn
- scikit-learn preperation
- Spark MLLib How to ignore features when training a classifier
- Increasing the efficiency of equipment using Amazon Machine Learning
- How to interpret scikit's learn confusion matrix and classification report?
- Amazon Machine Learning for sentiment analysis
- What Machine Learning algorithm would be appropriate?
- LDA generated topics
- Spectral clustering with Similarity matrix constructed by jaccard coefficient
- Speeding up Viterbi execution
- Memory Error with Classifier fit and partial_fit
- How to find algo type(regression,classification) in Caret in R for all algos at once?
- Difference between weka tool's correlation coefficient and scikit learn's coefficient of determination score
- What are the approaches to the Big-Data problems?
Related Questions in SCIKIT-LEARN
- How to use meshgrid with large arrays in Matplotlib?
- Enforcing that inputs sum to 1 and are contained in the unit interval in scikit-learn
- scikit-learn preperation
- Python KNeighborsClassifier
- How to interpret scikit's learn confusion matrix and classification report?
- svmlight / libsvm format
- Scikit-learn: overriding a class method in a classifier
- Memory Error with Classifier fit and partial_fit
- Difference between weka tool's correlation coefficient and scikit learn's coefficient of determination score
- Peak fitting with gaussian mixure model (Scikit); how to sample from a discrete pdf?
- sklearn LDA unique labels issue
- Break up Random forest classification fit into pieces in python?
- How to reuse pickled objects in python?
- Scikit Learn Multilabel Classification Using Out Of Core
- Scikit-learn Random Forest taking up too much memory
Related Questions in LDA
- LDA generated topics
- Do I need to transform unseen documents before projecting them onto model topics?
- LDA with tm package in R using bigrams
- How to find the number of documents (and fraction) per topic using LDA?
- Fitting LDA to corpus in LDA-C format in gensim
- Manually Specifying a Topic Model in R
- LDA Results Errors
- Create hierarchical relations between a set of terms
- How to match ngrams for each document in Spark LDA code
- How can I perform LDA (latent Dirichlet allocation) on Noun Phrases in R instead of words?
- MALLET Topic Modeling: Inconsistent Estimations
- LDA cross validation and variable selection
- install package lda and pyprind
- What kind of LDA performs 'fitcdiscr' function?
- Mallet LDA ArrayIndexOutOfBoundsException while training the model
Related Questions in LATENT-SEMANTIC-ANALYSIS
- Using the lsa package in R - Error in Ops.simple_triplet_matrix(m, 1) : Incompatible dimensions
- choose the proper clustering method for Latent Semantic Analysis
- Extracting word features from BERT model
- In Latent Semantic Analysis, how do you recombine the decomposed matrices after truncating the singular values?
- LSA Similarity interface
- How Sklearn Latent Dirichlet Allocation really Works?
- AttributeError: 'int' object has no attribute 'toarray'
- How do i retain numbers while preprocessing data using gensim in python?
- probabilistic latent semantic analysis R
- LSA - Feature selection
- Which formula of tf-idf does the LSA model of gensim use?
- Unsupervised commands classification
- How Latent Semantic Analysis Handle Semantics
- R Supervised Latent Dirichlet Allocation Package
- Finding Semantic Coherence between sentences in a text
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
If you have a predefined distribution of words in a pre-trained model you can just pass a bow_corpus through that distribution as a function. Gensims LDA and LDAMallet can both be trained once then you can pass a new data set through for allocation without changing the topics.
Steps:
Create a dictionary
Define a bow corpus
Train your model - skip if it's already trained
Import your new data and follow steps 1-4
Pass your new data through your model like this:
Your new data is allocated now and you can put it in a CSV