I am trying to write a Pig script that takes an XML file as input and returns all of the values for one of the child nodes in my file, after running some UDF. This is the script I am running:
REGISTER 'piggybank-0.15.0.jar';
REGISTER 'function.py' USING streaming_python as myFunc;
DEFINE XPathAll org.apache.pig.piggybank.evaluation.xml.XPathAll();
A = LOAD 'file.xml' using org.apache.pig.piggybank.storage.XMLLoader('Parent') as (x:chararray);
B = FOREACH A GENERATE XPathAll(x, 'Parent/Child', true, true) as (y:tuple);
C = FOREACH B myFunc.func(y);
DUMP C;
I am getting the following error after trying to call the UDF:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 10, column 14> Syntax error, unexpected symbol at or near 'myFunc'
Note: if I describe B without setting it as tuple, I get the result B: {()}. Am I calling the wrong thing to myFunc? I can't figure out how to pass the lines from B into myFunc.
I Think you are missing generate keyword in
foreach
.Can you try changing your code as below: