I want to create a rdd such that each row has an index. I tried the following
Given an rdd:
["a" "b" "c"]
(defn make-row-index [input]
(let [{:keys [col]} input]
(swap! @rdd assoc :rdd (-> (:rdd xctx)
(f/map #(vector %1 %2 ) (range))))))
Desired output:
(["a" 0] ["b" 1] ["c" 2])
I got an arity error, since f/map is used as (f/map rdd fn)
Wanted to use zipWithUniqueId() in apache spark but I'm lost on how to implement this and I cant find equivalent function in flambo. Any suggestion and help is appreciated.
Thanks
You can simply call
zipWithIndexfollowed bymapusinguntuple:You can use
.zipWithUniqueIdexactly the same way but result will be different from what you expect.zipWithUniqueIdwill generate pairs but index field won't be ordered.It should be also possible to use
zipwith, but as far as I can tell it doesn't work with infinite range.Whenever you use
zipyou should be careful though Generally speaking RDD should be considered as an unordered collection if there is a shuffling involved.