I have a Spark Dataframe with the below columns.
C1 | C2 | C3| C4
1 | 2 | 3 | S1
2 | 3 | 3 | S2
4 | 5 | 3 | S2
I want to generate another column C5 by taking distinct values from column C4 like C5
[S1,S2]
[S1,S2]
[S1,S2]
Can somebody help me how to achieve this in Spark data frame using Scala?
You might want to collect the distinct items from column 4 and put them in a List firstly, and then use
withColumn
to create a new columnC5
by creating audf
that always return a constant list: