I developed a function in clojure to fill in an empty column from the last non-empty value, I'm using flambo
which is an apache-spark wrapper using clojure for some of its functions
(:require [flambo.api :as f])
(defn replicate-val
[rdd input ]
(let [{:keys [ col ]} input
prev-col-val (atom [])
result (f/map rdd (f/fn [ row ]
(if-not (s/blank? (get row col))
(do
(swap! prev-col-val assoc 0 (get row col))
row)
(assoc row col (get @prev-col-val 0)))))]
result))
I don't really like the idea of mutating prev-col-val
to keep track of state,
Any ideas on how to refactor the above to preserve clojure's immutable data structure?
Input is of the form:
[["04" "2" "3"] ["04" "" "5"] ["5" "16" ""] ["07" "" "36"] ["07" "" "34"] ["07" "25" "34"]]
and desired output is:
[["04" "2" "3"] ["04" "2" "5"] ["5" "16" ""] ["07" "16" "36"] ["07" "16" "34"] ["07" "25" "34"]]
Thanks