I have some records where every row belongs to some categories (data type - array of string) and a separate list of unique category(data type - string). I need to match every row with unique list and create flags for it.
Input:
------
ID Category
1 ["Physics","Math"]
2 ["Math"]
3 ["Math,"Chemistry"]
4 ["Physics","Computer"]
Now I have separate list of unique list of category in excel in local like below:
Unique Category
["Physics"]
["Math"]
["Chemistry"]
["Computer"]
Final Output should look like this:
ID Category Math_F Physics_F Computer_F Chemistry_F
1 ["Physics","Math"] 1 1 0 0
2 ["Math"] 1 0 0 0
3 ["Math,"Chemistry"] 1 0 0 1
4 ["Physics","Computer"] 0 1 1 0
Can someone please help with query, steps and explanation. I am new to Hive.
Using
array_contains()
:And if you want columns to be constructed dynamically using your array of unique categories, then use some other tool for building query. For example it can be done using shell script.
See this example of building SQL based on pre-defined array. You can easily add your array reading from the file:
Result:
You can add Hive call to the above script:
hive -e "$sql"
to execute it, or save it to the file instead.