i want to have phrases in doc2vec and i use gensim.phrases. in doc2vec we need tagged document to train the model and i cannot tag the phrases. how i can do this?
here is my code
text = phrases.Phrases(text)
for i in range(len(text)):
string1 = "SENT_" + str(i)
sentence = doc2vec.LabeledSentence(tags=string1, words=text[i])
text[i]=sentence
print "Training model..."
model = Doc2Vec(text, workers=num_workers, \
size=num_features, min_count = min_word_count, \
window = context, sample = downsampling)
The invocation of
Phrases()trains a phrase-creating-model. You later use that model on text to get back phrase-combined text.Don't replace your original
textwith the trained model, as on your code's first line. Also, don't try to assign into the Phrases model, as happens in your current loop, nor access the Phrases model by integers.The gensim docs for the Phrases class has examples of the proper use of the
Phrasesclass; if you follow that pattern you'll do well.Further, note that
LabeledSentencehas been replaced byTaggedDocument, and itstagsargument should be a list-of-tags. If you provide a string, it will see that as a list-of-one-character tags (instead of the one tag you intend).