Using Python UDF with Hive

3k Views Asked by Rakesh Adhikesavan At 13 February 2017 at 22:23

I am trying to learn using Python UDF's with Hive.

I have a very basic python UDF here:

import sys
for line in sys.stdin:
    line = line.strip()
    print line

Then I add the file in Hive:

ADD FILE /home/hadoop/test2.py;

Now I call the Hive Query:

SELECT TRANSFORM (admission_type_id, description)
USING 'python test2.py'
FROM admission_type;

This works as expected, no changes is made to the field and the output is printed as is.

Now, when I modify the UDF by introducing the split function, I get an execution error. How do I debug here? and what am I doing wrong?

New UDF:

import sys
for line in sys.stdin:
    line = line.strip()
    fields = line.split('\t') # when this line is introduced, I get an execution error
    print line

Original Q&A

There are 1 best solutions below

Jinand On 26 September 2017 at 10:51

import sys

for line in sys.stdin:
    line = line.strip()
    field1, field2 = line.split('\t') 
    print '\t'.join([str(field1), str(field2)])


SELECT TRANSFORM (admission_type_id, description)    
USING 'python test2.py' As ( admission_type_id_new, description_new)    
FROM admission_type;

Using Python UDF with Hive

There are 1 best solutions below

Related Questions in HIVE

Related Questions in HIVEQL

Related Questions in APACHE-HIVE

Trending Questions

Popular # Hahtags

Popular Questions