how to import file to mallet for topic modelling

788 Views Asked by At

I wanna use mallet for topic modelling and I have a question.My data is in a file one instance per line.But I didnt consider any label or instance name.So each line starts with the text.Is it required to have those labels or instance names?

1

There are 1 best solutions below

0
On

I am not sure about what exactly do you want. For me, in Windows, I put all my data in a folder like "D:\Data\test1", in "test1" folder, there are a number of .txt files, each of them is one instance. Then I use bin\mallet import-dir --input D:\Data\test1 --output test1.mallet --keep-sequence --remove-stopwords --extra-stopwords extra.txt to generate the model.

I wish this could help. BTW, you can generate separate .txt files using Word or Excel Macro.