I have my rows setup in the JavaPairRDD<String, MyPojo>
where MyPojo
is a pojo with an attribute (let's call it HashSet<String> values
).
Now I want to cluster (merge) my rows based on any intersection with MyPojo.values
.
For example:
<Row K1 : MyPojo (values: [A,B,C])>
<Row K2 : MyPojo (values: [A,B])>
<Row K3 : MyPojo (values: [D,E,F])>
I want to merge the rows with keys K1, K2
.
If keys with values intersection have to be found, such approach can be used:
Output is: