Pig Script Using XPathAll and Python UDF

205 Views Asked by At

I am trying to write a Pig script that takes an XML file as input and returns all of the values for one of the child nodes in my file, after running some UDF. This is the script I am running:

REGISTER 'piggybank-0.15.0.jar';
REGISTER 'function.py' USING streaming_python as myFunc;
DEFINE XPathAll org.apache.pig.piggybank.evaluation.xml.XPathAll();
A = LOAD 'file.xml' using org.apache.pig.piggybank.storage.XMLLoader('Parent') as (x:chararray);
B = FOREACH A GENERATE XPathAll(x, 'Parent/Child', true, true) as (y:tuple);
C = FOREACH B myFunc.func(y);
DUMP C;

I am getting the following error after trying to call the UDF:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 10, column 14>  Syntax error, unexpected symbol at or near 'myFunc'

Note: if I describe B without setting it as tuple, I get the result B: {()}. Am I calling the wrong thing to myFunc? I can't figure out how to pass the lines from B into myFunc.

1

There are 1 best solutions below

0
On

I Think you are missing generate keyword in foreach.

Can you try changing your code as below:

C = FOREACH B GENERATE myFunc.func(y);