Store the output of pig job into a directory structure derived from data

84 Views Asked by At

I would like to achieve the following:

My input data looks as follows

{"metadata":
{
"producerName":"capture_api",
"producerVersion":"3.0.13"
},
"payload":
{
--some payload 
}
}

I would like to bucket this data using a pig script as follows

/finalOutputDir/producerName/producerVersion/File.txt

Is there a way I can do this. I have tried using the MultiStorage Function but that class supports only one field. I can override the functionality inside multistage but just wanted to check if there is a easier option.

1

There are 1 best solutions below

0
kecso On

The piggybank MultiStorage could separate the data into multiple folders by a (only one?) field.

STORE data INTO '$out/$producerName' USING org.apache.pig.piggybank.storage.MultiStorage('$out/$producerName', '0', 'none', ',');