case statement in Spark SQL

11.9k Views Asked by At

I am working on a workflow for my company. Therefore I need to use a Spark SQL case-statement to filter something.

I have a column called OPP_amount_euro (the amount of money used for something is saved there) and I have a column called OPP_amount_euro_binned (default value is 1). So I want to program some kind of interval. If the value in OPP_amount_euro is < 30000 the value in OPP_amount_euro_binned should be 1, and so on.

I already tried to find a solution, but it´s not the best one.

select
case when OPP_amount_eur < 30000 then 1
when OPP_amount_eur >= 30000 then 2
when OPP_amount_eur >= 50000 then 3
when OPP_amount_eur >= 100000 then 4
when OPP_amount_eur >= 300000 then 5
when OPP_amount_eur >= 500000 then 6
when OPP_amount_eur >= 1000000 then 7
end as OPP_amount_eur_binned
from inputTable

so this code is working well but I can´t select any further columns in the table. If I write a '*' after the select I will get following error message:

Exception in processing: ParseException: mismatched input 'when' expecting {, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 2, pos 5) == SQL == Select * case when OPP_amount_eur < 30000 then 1 -----^^^ when OPP_amount_eur >= 30000 then 2 when OPP_amount_eur >= 50000 then 3 when OPP_amount_eur >= 100000 then 4 when OPP_amount_eur >= 300000 then 5 when OPP_amount_eur >= 500000 then 6 when OPP_amount_eur >= 1000000 then 7 end as OPP_amount_eur_binned from temptable3083b308bcec4124b6a4650f2bb40695

Why I can´t do this? I searched on the internet for it and in normal SQL it seems to work, why this is not possible in Spark SQL? Is there any solution?

I am sorry for my bad description, but I am absolutely new here and also I never had contact with Spark SQL. I am in my traineeship as a student.

2

There are 2 best solutions below

0
On BEST ANSWER

This is the solution for my problem

 Select inputTable.*,

case 
     when OPP_amount_eur between 0 and 30000 then 1
     when OPP_amount_eur between 30000 and 50000 then 2
     when OPP_amount_eur between 50000 and 100000 then 3
     when OPP_amount_eur between 100000 and 300000 then 4
     when OPP_amount_eur between 300000 and 500000 then 5
     when OPP_amount_eur between 500000 and 1000000 then 6
     else '7'

     end as OPP_amount_eur_binned

from inputTable
0
On

You should use aliases :

SELECT CASE....,
       t.*
FROM YourTable t