Error code 126/127 when using mallet on Google colab

441 Views Asked by At
from gensim.models.wrappers import LdaMallet
# mallet_path = 'C:/Users/kmuth/Downloads/mallet-2.0.8/bin/mallet' # update this path
mallet_path = '/content/drive/MyDrive/data/mallet/mallet-2.0.8/bin/mallet'
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=doc_term_matrix, num_topics=15, id2word=dictionary)

I'm currently having this error: CalledProcessError: Command '/content/drive/MyDrive/data/mallet/mallet-2.0.8/bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /tmp/d20b66_corpus.txt --output /tmp/d20b66_corpus.mallet' returned non-zero exit status 126.

when trying to create an lda model using mallet as i have read it somehow does a better job than the built in lda model in the gensim package.

i'm trying to follow this tutorial(to try the mallet bit): https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/#14computemodelperplexityandcoherencescore

I would really appreciate any help as I don't have any idea to what the error is. Is it not finding the file, do I have to install it? Pretty much a noob

I have tried to change the mallet path around from my pc and from my drive with no avail.

Thank you, Sean

1

There are 1 best solutions below

3
gojomo On

IIUC, Google Colab runs on Google's servers.

Did you compile/install the native mallet executable (on which that gensim.models.wrappers.LdaMallet depends) to a path and format – accessable to the notebook – from which the Google Colab notebook can execute it? (Is that even allowed in Google Colab?)

Note also that the latest (4.0+) versions of Gensim have eliminated the wrappers as somewhat awkward to use, hard-to-maintain, & somewhat redundant with other implementations. So you may want to consider using Gensim's own LdaModel instead of this no-longer-supported wrapper of another package.