I'm working on developing a function to convert a vector into an sql row to further convert it to a data-frame and save it into table using SQLcontext in Apache spark. I'm developing in clojure and I got lost along the way. I thought of implementing the solution thus:
- For each rdd (vector) convert it to rows
- Convert the rows to a data frame
- Save data frame to a table
- use the sqlContext to query for particular information in the table
and how to convert the result from query into into RDD again for further analysis.
(defn assign-ecom [] (let [rdd-fields (-> (:rdd @transformed-rdd) (f/map #(sql/row->vec %)) f/collect)] (clojure.pprint/pprint rdd-fields)))
I'm using flambo v0.60 api function for abstracting Apache-spark functions, I also welcome any suggestion as to how to go about solving the problem. Thanks
Here's the link to Flambo row -> vec docs:
I assume you already have
spark-context
(sc
) andsql-context
(sql-ctx
). First lets import all the stuff we'll need:For each rdd (vector) convert it to rows
Convert the rows to a data frame
Save data frame to a table
use the sqlContext to query for particular information in the table
and how to convert the result from query into into RDD again for further analysis.
or if you want vectors: