Create a pandas table (DataFrame) with a row for each topic (cluster). Add the following columns for each topic:
- 3 columns containing the 3 words most similar to the topic
- 3 columns containing the 3 documents most similar to the topic
- 3 columns containing the similarity score between the 3 documents from 2. and the topic
Hint: one way to make a DataFrame, is to first make a two-dimensional Python list. Then make a DataFrame from this list.
This is the idea, but it does not work:
import pandas as pd
data = []
for topic_id in range(model.get_num_topics()):
# Get the top 3 words for the topic
topic_words = model.topic_words[topic_id][:3]
# Get the top 3 similar documents for the topic
doc_indices = model.topic_doc_indices[topic_id][:3]
similar_docs = [facts_list[idx] for idx in doc_indices]
# Get the similarity scores between the top 3 documents and the topic
similarity_scores = model.get_document_topic_similarity(doc_indices, topic_id)
# Append the information for the current topic to the data list
data.append([topic_id, topic_words, similar_docs, similarity_scores])
columns = ['Topic', 'Top 3 Words', 'Top 3 Similar Docs', 'Similarity Scores']
df = pd.DataFrame(data, columns=columns)
print(df)
There is insufficient information to correctly answer the question. IIUC, this is what I would do:
Output: