How can I use Mahout's sequencefile API code?

6.2k Views Asked by At

There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i <input address> -o <output address>. I want use this command as code API.

1

There are 1 best solutions below

9
On

You can do something like this:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;


Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

Path outputPath = new Path("c:\\temp");

Text key = new Text(); // Example, this can be another type of class
Text value = new Text(); // Example, this can be another type of class

SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, outputPath, key.getClass(), value.getClass());

while(condition) {

    key = Some text;
    value = Some text;

    writer.append(key, value);
}

writer.close();

You can find more information here and here

Additionally, you could call the exact same functionality you described from Mahout by using the org.apache.mahout.text.SequenceFilesFromDirectory

Then the call looks something like this:

ToolRunner.run(new SequenceFilesFromDirectory(), String[] args //your parameters);

The ToolRunner comes from org.apache.hadoop.util.ToolRunner

Hope this was of help.