pig latin - load with text qualifier

280 Views Asked by At

I am trying to load a datafile in a pig latin script, Data has 2 columns but there is a text qualifier in the 2nd column and sample data is below :

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200"

When I try loading the date as below, 2nd column is not recognized as 1 column

deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );

How can I define the text qualifier while loading the data set ?

1

There are 1 best solutions below

1
On BEST ANSWER

Try this , let me know if you need different output format

input.txt

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200

PigScript:

A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;

OutPut:

(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")