How to read SEQ files in pig

321 Views Asked by At

I have M, U, and userRatings part-files as an intermediate result of an ALS matrix factorization process.

The header are:

SEQ. org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable

I need to operate with that vectors/features, to find out an explanation for the ALS recommendations (it is a guess). It need to be on PIG.

Thanks, Er

1

There are 1 best solutions below

0
On

Try this link, it has lot of examples about how to load,store and process the SEQ files using elephantbird.

Ex:

     pair = LOAD '$data' USING com.twitter.elephantbird.pig.load.SequenceFileLoader (
       '-c com.twitter.elephantbird.pig.util.IntWritableConverter', 
       '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter'
     ) AS (key: int, val: (f1: double, f2: double, f3: double));

http://grepcode.com/file/repo1.maven.org/maven2/com.twitter.elephantbird/elephant-bird-mahout/3.0.1/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java