Avoid RDD nested in Spark without Array

233 Views Asked by At

I've a big problem!

I have an RDD[(Int, Vector)] , where the Int is a sort of label.

For example :

(0, (a,b,c) );
(0, (d,e,f) );
(1, (g,h,i) )

etc...

Now, i need to use this RDD(I call it myrdd ) like this :

myrdd.map{  case(l,v) => 
   myrdd.map { case(l_, v_) => 
      compare(v, v_)
   }
}

Now, I know that it's impossible in spark to use RDD nested.

I can bypass the problem using an Array. But for my problem i can't use Array, or anything that goes in memory.

How could I resolve my problem WITHOUT USING ARRAY?

Thanks in advance!!!

1

There are 1 best solutions below

1
On

cartesian sounds like it should work:

myrdd.cartesian(myrdd).map{
  case ((_,v),(_,v_)) => compare(v,v_)
}