I have a bunch of uncompressed protobuf binary log files (*.binlog). In additional to being uncompressed, each file/stream contains a variable number of messages.
I'm trying to load these files into HDFS and query them using Pig.
My question is:
Is it possible to read the uncompressed files using Elephant-Bird at all? I have also tried to read bzip2 encoded files but my attempts so far
gives me the Failed to read from file error
I'm trying
register '/all-libraries/*.jar';
raw_data = load 'file.binlog' using com.twitter.elephantbird.pig.load.ProtobufPigLoader('my_package.My_proto_Class');
The load function returns without error but when I say
value = foreach raw_data generate field1; //doesn't throw error
dump value //throws error
Pig Stack Trace
ERROR 1066: Unable to open iterator for alias value
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias person_phone_numbers at org.apache.pig.PigServer.openIterator(PigServer.java:892) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:541) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:884) ... 13 more
NAy anybody give a hint if this is possible at all? How can I direct ProtobufPigLoader to read the message length before reading the message?