zipWithUniqueId() in flambo using clojure

218 Views Asked by Jyd At 24 June 2025 at 22:32

I want to create a rdd such that each row has an index. I tried the following

Given an rdd:

["a" "b" "c"] 

(defn make-row-index [input]
  (let [{:keys [col]} input]
    (swap! @rdd assoc :rdd (-> (:rdd xctx)
                          (f/map #(vector %1 %2 ) (range))))))

Desired output:

 (["a" 0] ["b" 1] ["c" 2])

I got an arity error, since f/map is used as (f/map rdd fn) Wanted to use zipWithUniqueId() in apache spark but I'm lost on how to implement this and I cant find equivalent function in flambo. Any suggestion and help is appreciated.

Apache-spark zip with Index

Map implementation in flambo

Thanks

Original Q&A

There are 1 best solutions below

zero323 On 03 August 2015 at 16:02 BEST ANSWER

You can simply call zipWithIndex followed by map using untuple:

(def rdd (f/parallelize sc ["a" "b" "c"]))
(f/map (.zipWithIndex rdd) f/untuple)

You can use .zipWithUniqueId exactly the same way but result will be different from what you expect. zipWithUniqueId will generate pairs but index field won't be ordered.

It should be also possible to use zip with, but as far as I can tell it doesn't work with infinite range.

(def idx (f/parallelize sc (range (f/count rdd))))
(f/map (.zip rdd idx) f/untuple)

Whenever you use zip you should be careful though Generally speaking RDD should be considered as an unordered collection if there is a shuffling involved.

zipWithUniqueId() in flambo using clojure

There are 1 best solutions below

Related Questions in CLOJURE

Related Questions in APACHE-SPARK

Related Questions in FLAMBO

Trending Questions

Popular # Hahtags

Popular Questions