How to multiply each row in RDD with each other?

542 Views Asked by At

I have a RDD which is similar to,

CELL-ID | COUNT
--------------
abcd       10
DEF        20
ghi        15

I need to get an RDD with

CELL-ID-1 | CELL-ID-2 | PRODUCT
--------------
abcd       DEF            200
abcd       ghi            150
DEF        abcd           200
DEF        ghi            300
...
....

How can this be done ? I've tied to use cartesian product but couldn't get the output

val result = orginalRDD.cartesian(orginalRDD).collect {
  case ((t1: _,Int), (t2: _,Int)) if t1 != t2 => t1 * t2
}
1

There are 1 best solutions below

6
On

You can either make t1 and t2 represent the tuples (entire "records"):

val result = orginalRDD.cartesian(orginalRDD).collect {
  case (t1: (String ,Int), t2: (String ,Int)) if t1 != t2 => (t1._1, t2._1, t1._2 * t2._2)
}

Or, you can do the same but use the pattern-matching to break them up further:

val result = orginalRDD.cartesian(orginalRDD).collect {
  case (t1@(s1 ,i1), t2@(s2, i2)) if t1 != t2 => (s1, s2, i1 * i2)
}

Your solution looks like an attempt to do both at once...