I have several CSV files and the header is always the first line in the file. What's the best way to get that line out of the CSV file as a string in Pig? Preprocessing with sed, awk etc is not an option.
I've tried loading the file with regular PigStorage and the Piggy bank CsvLoader, but its not clear to me how I can get that first line, if at all.
I'm open to writing an UDF, if that's what it takes.
Disclaimer: I'm not great with Java.
You are going to need a UDF. I'm not sure exactly what you are asking for, but this UDF will take a series of CSV files and turn them into maps, where the keys are the values at the top of the file. This should hopefully be enough of a skeleton so that you can change it into what you want.
The couple of tests I've done remotely and locally indicate that this will work.
Sample output loading a directory with:
Produces this output:
The
()s are the top lines of the file.getNext()requires that we return something, otherwise the file will stop being processed. Therefore they return a null schema.