I am working on a workflow for my company. Therefore I need to use a Spark SQL case-statement to filter something.
I have a column called OPP_amount_euro (the amount of money used for something is saved there) and I have a column called OPP_amount_euro_binned (default value is 1). So I want to program some kind of interval. If the value in OPP_amount_euro is < 30000 the value in OPP_amount_euro_binned should be 1, and so on.
I already tried to find a solution, but it´s not the best one.
select
case when OPP_amount_eur < 30000 then 1
when OPP_amount_eur >= 30000 then 2
when OPP_amount_eur >= 50000 then 3
when OPP_amount_eur >= 100000 then 4
when OPP_amount_eur >= 300000 then 5
when OPP_amount_eur >= 500000 then 6
when OPP_amount_eur >= 1000000 then 7
end as OPP_amount_eur_binned
from inputTable
so this code is working well but I can´t select any further columns in the table. If I write a '*' after the select I will get following error message:
Exception in processing: ParseException: mismatched input 'when' expecting {, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 2, pos 5) == SQL == Select * case when OPP_amount_eur < 30000 then 1 -----^^^ when OPP_amount_eur >= 30000 then 2 when OPP_amount_eur >= 50000 then 3 when OPP_amount_eur >= 100000 then 4 when OPP_amount_eur >= 300000 then 5 when OPP_amount_eur >= 500000 then 6 when OPP_amount_eur >= 1000000 then 7 end as OPP_amount_eur_binned from temptable3083b308bcec4124b6a4650f2bb40695
Why I can´t do this? I searched on the internet for it and in normal SQL it seems to work, why this is not possible in Spark SQL? Is there any solution?
I am sorry for my bad description, but I am absolutely new here and also I never had contact with Spark SQL. I am in my traineeship as a student.
This is the solution for my problem