Using Hive with Pig through HCatalog issue with TimeStamp datatype

1.2k Views Asked by At

In my dev box, I have MapR 3.0.2, Hive 0.11, HCatLog 0.4.1 & Pig 0.12. Am using HCatlog to read and write Hive tables from Pig (Pig Latin), using standard queries,

A = LOAD 'dbname.tablename' USING org.apache.hcatalog.pig.HCatLoader();

My Hive table contains TIMTESTAMP datatype (few columns), so immediately after this statement in GRUNT, am getting error as

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Type timestamp not present

There is a similar query at : Type conversion pig hcatalog .

I can't change the data-type of my Hive tables. Also I can't change or upgrade any of these tools to newer versions (client policy constraint). How can I cast it to string or any supported data-type, so that HCatalog passes it between Pig & Hive and I can continue further?

3

There are 3 best solutions below

1
On

As per the Hive, Pig and HCatalog version that you are using, timestamp is not supported. And there is no way that you can directly load it into Pig using HCatalog from Hive.

There is a work around, if you create a temp hive table and change the datatype from timestamp to string. This way you will be able to load it into Pig as chararray. Once you have your data loaded in Pig, you can always typecast it.

0
On

It will be supported under hive 0.13, they have an issue about this problem that was already solved, you can see the issue in https://issues.apache.org/jira/browse/HIVE-5814

org.apache.hcatalog.pig.HCatLoader has been deprecated in Hive 0.12. In fact every class in org.apache.hcatalog has been deprecated. All new features are added in org.apache.hive.hcatalog which contains all the classes/methods from org.apache.hcatalog and new APIs.

0
On

If you are using hive 0.13 or later, instead of

A = LOAD 'dbname.tablename' USING org.apache.hcatalog.pig.HCatLoader();

do

A = LOAD 'dbname.tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

org.apache.hcatalog.pig.HCatLoader is now deprecated. The new class supports the pig datetime type and will convert from hive appropriately.

Note, however, that there will be data loss when using HCatLoader since these represent times in different ways, since pig datetime represents dates to millisecond precision and hive represents to nanosecond.

eg. Hive - > Pig will result in nanoseconds being lost from the timestamp (converted to the nearest millisecond!)