Is there a way to not to include all the columns in select transform () yet to get all the columns in output?
for example: I have columns in hive table like:
c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
and I am performing transform on columns c8, c9, c10
and output contains c1, c2, c3, c4, c5, c6, c7, co
where co
= output after performing transform on columns c8, c9, c10
There is a way I can do this:
select transform (c1,c2,c3,c4,c5,c6,c7,c8,c9,c10)
using 'python udf_name'
as (c1,c2,c3,c4,c5,c6,c7,co)
from table_name;
The problem is I don't want to pass all the columns in select transform as there are nearly 900 columns in my table, and its difficult to figure it out that on which columns the UDF works.
Example:
#temp
c1, c2, c3, c4
a, 1, 0, 5
b, , 8, 9
Now I want to find first Non-zero Not Null value from column c2, c3, c4
and print it with column c1
here is the python UDF
test.py:
import sys
for line in sys.stdin:
line=line.strip()
c=line.split()
l=len(c)
for i in range (1,l):
try:
if (int(c[i])==0):
pass
else:
print c[i]
break
except ValueError:
pass
I can Achieve this by passing all the columns
select transform (c1,c2,c3,c4)
using 'python test.py'
as (c1,co)
from temp
output :
c1, co
a, 1
b, 8
Problem : I don't want to pass all the columns in select transform, as I have 900 columns.
Basically I want to pass only those columns which are involved in the UDF and not all the columns.