Pig HbaseStorage customization

336 Views Asked by At

How can I customize HbaseStorage for pig script? Actually I want to perform some business logic on the data before loading it to the pig script. It would be something like custom storage on top of HbaseStorage.

e.g I've my row key has structure like this A_B_C. Currently, I'm passing A_B_C key in HbaseStorage in my pig script but I want to perform some logic like filtering etc against key like A_B_C_D before serving input data to actual pig script. How is it possible

2

There are 2 best solutions below

1
On

You may have to end up looking at the HBaseStorage java class and implementing your own classes based on that. Depending on how the HBaseStorage and associated classes have been written, this could vary from being easy (just extend HBaseStorage itself and overwrite where necessary) to a real headache.

You then have to ensure that the .jar containing your code is on the pig classpath.

0
On

I find HbaseStorage to be a real pain, so I write regular Java MR jobs to query HBase and create custom sequence files, which I then use from Pig with a simple custom loader. I find this saves a ton of time since the sequence file can be re-used many times throughout the day to get quick results, rather than scanning everything in Hbase for every Pig script.