My problem is registering the python UDF in hive.
I created an encryption and decryption python code to use in hive query. These are working as expected. however I do not want to add files every time I use it, instead would like to make a permanent registry in udf.
I see the java jar file can be added to udf list using "Create temporary Function ". I do not see such thing for python.
appreaciate any quick help.
Below are the scripts that is working for me.
vi decry.py
import base64
import sys
try:
for line in sys.stdin:
line=line.strip()
(a,b)=line.split('\t')
print('\t'.join([a,b,base64.b64decode(bytes(a.replace('\n',''),encoding='utf-8')).decode()]))
except:
print(sys.exc_info())
vi encry.py
import base64
import sys
try:
for line in sys.stdin:
line=line.strip()
(a,b)=line.split('\t')
print('\t'.join([a,b,base64.b64encode(bytes(a.replace('\n',''),encoding='utf-8')).decode()]))
except:
print(sys.exc_info())
Testing on Database
create database test;
use test;
create table abc (id integer,name string);
insert into abc values(1,'vijay');
insert into abc values(2,'vaasista');
add FILE /hadoop/encry.py;
add FILE /hadoop/decry.py;
select transform(id) using 'python3 encry.py' as id from abc;
select transform(id,name) using 'python3 encry.py' as id,name,ct from
abc;
First of all,
base64is not encryption, it is encoding in different character set (radix-64 - base) and everyone can reverse it, because no secret key is used.Second: Unfortunately, the only way of using Python UDF is TRANSFORM in Hive, CREATE [TEMPORARY] FUNCTION does not support Python.
Consider using using native functions for better performance and flexibility, like this:
Demo:
Result:
Decode:
Result:
Also there are aes_encrypt and aes_decrypt functions available in Hive if you need encryption.