Just for transformation, map and foreachRDD can achieve the same goal, but which one is more efficient? And why?
for example,for a DStream[Int]:
val newDs1=Ds.map(x=> x+1)
val newDs2=Ds.foreachRDD (rdd=>rdd.map(x=> x+1))
I know foreachRDD will operate on the RDD directly, but map seams to transform DStream to RDD first(not sure), thus foreachRDD seams more efficient than map. However, map is a Transformations Operation while foreachRDD is a Output Operations. Thus, map should be more efficient than foreachRDD while doing transformation. Anybody knows which one is right and why? Thanks for any reply.
Add one more comparison:
val newDS3=Ds.transform (rdd=>rdd.map(x=> x+1))
which is more efficient for transformation?
You could answer this question yourself if you checked the types.
foreachRDD
isUnit
so what you have is:You not only don't have
DStream[_]
, but internalmap
is never executed (it is lazy).Following two:
are identical in terms of execution, so it doesn't make sense to use the latter one, which is unnecessarily verbose.