Register python custom UDF in hive

741 Views Asked by At

My problem is registering the python UDF in hive.

I created an encryption and decryption python code to use in hive query. These are working as expected. however I do not want to add files every time I use it, instead would like to make a permanent registry in udf.

I see the java jar file can be added to udf list using "Create temporary Function ". I do not see such thing for python.

appreaciate any quick help.

Below are the scripts that is working for me.

vi decry.py 

    import base64
    import sys
    
    try:
        for line in sys.stdin:
            line=line.strip()
            (a,b)=line.split('\t')
            print('\t'.join([a,b,base64.b64decode(bytes(a.replace('\n',''),encoding='utf-8')).decode()]))
    except:
        print(sys.exc_info())

vi encry.py


    import base64
    import sys
    
    try:
        for line in sys.stdin:
            line=line.strip()
            (a,b)=line.split('\t')
            print('\t'.join([a,b,base64.b64encode(bytes(a.replace('\n',''),encoding='utf-8')).decode()]))
    except:
        print(sys.exc_info())

   

Testing on Database

    create database test; 
    use test; 
    create table abc (id integer,name string); 
    insert into abc values(1,'vijay'); 
    insert into abc values(2,'vaasista');
    add FILE /hadoop/encry.py; 
    add FILE /hadoop/decry.py;
    select transform(id) using 'python3 encry.py' as id from abc;
    select transform(id,name) using 'python3 encry.py' as id,name,ct from
    abc;
1

There are 1 best solutions below

2
leftjoin On

First of all, base64 is not encryption, it is encoding in different character set (radix-64 - base) and everyone can reverse it, because no secret key is used.

Second: Unfortunately, the only way of using Python UDF is TRANSFORM in Hive, CREATE [TEMPORARY] FUNCTION does not support Python.

Consider using using native functions for better performance and flexibility, like this:

--encode
base64(encode(regexp_replace(text_col,'\\n',''), 'UTF-8'))    
--decode
decode(unbase64(str),'UTF-8')

Demo:

select base64(encode(regexp_replace('My test \n String','\\n',''), 'UTF-8'))

Result:

TXkgdGVzdCAgU3RyaW5n

Decode:

select decode(unbase64('TXkgdGVzdCAgU3RyaW5n'), 'UTF-8')

Result:

My test  String

Also there are aes_encrypt and aes_decrypt functions available in Hive if you need encryption.