How to understand about "Phi value" in gensim LDA model

127 Views Asked by At

At the view of document, I want to know the term probability of the topic for each document from the gensim LdaModel. And I got something like this

lda_model = LdaModel(corpus, id2word=dictionary, num_topics=50)


# phi relevance of the document 1
phi_doc1 = lda_model.get_document_topics(corpus[1], 
minimum_probability=0.05, per_word_topics=True)[2]


phi_doc1
---
[(52, [(8, 19.999924)]),
 (69, [(8, 666.9981)]),
 (241, [(8, 30.999844)]),
 (482, [(8, 0.9999151)]),
 (593, [(8, 5.9999304)])]

but I couldn't understanding the meaning of the values.

I want to know the meaning of the phi relevance. I didn't understand after I read the help message


help(lda_model.get_document_topics)

--
" ...
Phi relevance values, multiplied by the feature length, 
for each word-topic combination.
Each element in the list is a pair of a word's id and 
a list of the phi values between this word and each topic..."

What is the meaning of the values : lda_model.get_document_topics(corpus[1], minimum_probability=0.05, per_word_topics=True)[2]

Is this "the term probability of the topic for each document" ?

1

There are 1 best solutions below

0
kmklim On

My understanding is that the result you received means the following: list of word-ids and tuples of (topic number, phi value). What you wanted is document probabilities for each topic.

If your task is to get just the document probabilities, use per_word_topics=False in get_document_topics(). This returns tuples of (topic, probability) for the document. More here: https://radimrehurek.com/gensim/models/ldamodel.html

Phi values are relative measures of word distribution. They tell which word increases the probability of a document belonging to a topic (topic 8 in your case). Check out this: https://miningthedetails.com/LDA_Inference_Book/lda-inference.html