solution for "select transform" for python udf in hive

1.1k Views Asked by At

Is there a way to not to include all the columns in select transform () yet to get all the columns in output?

for example: I have columns in hive table like:

c1, c2, c3, c4, c5, c6, c7, c8, c9, c10

and I am performing transform on columns c8, c9, c10 and output contains c1, c2, c3, c4, c5, c6, c7, co where co = output after performing transform on columns c8, c9, c10

There is a way I can do this:

select transform (c1,c2,c3,c4,c5,c6,c7,c8,c9,c10)
using 'python udf_name'
as (c1,c2,c3,c4,c5,c6,c7,co)
from table_name;

The problem is I don't want to pass all the columns in select transform as there are nearly 900 columns in my table, and its difficult to figure it out that on which columns the UDF works.

Example:

#temp
c1, c2, c3, c4  
 a,  1,  0, 5  
 b,   ,  8, 9  

Now I want to find first Non-zero Not Null value from column c2, c3, c4 and print it with column c1

here is the python UDF

test.py:

import sys
for line in sys.stdin:
    line=line.strip()
    c=line.split()
    l=len(c)
    for i in range (1,l):
        try:
            if (int(c[i])==0):
                pass
            else:
                print c[i]
                break
        except ValueError:
            pass

I can Achieve this by passing all the columns

select transform (c1,c2,c3,c4)
using 'python test.py'
as (c1,co)
from temp

output :

c1, co  
 a,  1  
 b,  8  

Problem : I don't want to pass all the columns in select transform, as I have 900 columns.

Basically I want to pass only those columns which are involved in the UDF and not all the columns.

0

There are 0 best solutions below