how to convert a matrix to BoW format?

102 Views Asked by Yu Fu At 11 April 2022 at 05:20

I am trying to convert a matrix to the type that can be received by gensim. AuthorTopic Model, which means I should convert a matrix to a sparse vector. I have already tried several functions in gensim like gensim.matutils.full2sparse and gensim.matutils.any2sparse. But there is something wrong:

my code:

matrix=numpy.array([[1,0 ,1],[0,1,1]])
mycorpus=any2sparse(matrix)
print(matrix)
print(mycorpus)

the output:

[[1 0 1]
 [0 1 1]]

[(0, 1.0), (0, 1.0), (1, 0.0), (1, 0.0)] #mycorpus

accoring to the tutorial, mycorpus should be like:

[[(0,1),(2,1)]
 [(1,1),(2,1)]]

I have no idea what's wrong. I really appreciate if anyone could give me some advise.

Original Q&A

There are 1 best solutions below

gojomo On 11 April 2022 at 17:10

The Gensim AuthorTopicModel docs describe its desired corpus-format as iterable of list of (int, float).

Those int values would be word-ids, and ideally be accompanied by the id2word dict which idntifies which int means which word.

What's the source of your matrix, & do you know if it's the rows or the columns that represent words, and have a mapping of indexes to words? That will drive the conversion.

Also, as the docs mention, "The model is closely related to LdaModel. The AuthorTopicModel class inherits LdaModel, and its usage is thus similar.

Have you reviewed guides to Gensim LDA usage to see how they prepare their corpus, such as the multiple Usage Examples, to see if that helps suggest steps & necessary formats?

Or, is your corpus still available as texts, so you can directly use the examples there as a model to turn the text into the BoW format (rather than your already-processed matrix)?

If you're still having problems, you should expand your question text with more details, especially how the true corpus matrix that you have was created, and which errors you've encountered (& how you triggered them) that convince you things aren't working.

how to convert a matrix to BoW format?

There are 1 best solutions below

Related Questions in GENSIM

Related Questions in CORPUS

Related Questions in SPARSE-VECTOR

Trending Questions

Popular # Hahtags

Popular Questions