Clojure - No matching method found for select method in DataFrame when using Flambo

833 Views Asked by At

I'm using Flambo to work with Spark. I want to retrieve a DataFrame which contains given column names. I wrote a simple function as follows:

(defn make-dataset
  ([data-path column-names and-another]
    (let [data (sql/read-csv sql-context data-path)
      cols (map #(.col data %) column-names)]
      (.select data (Column. "C0")))))

I get the following exception when i execute it.

IllegalArgumentException No matching method found: select for class org.apache.spark.sql.DataFrame clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:80)

What am i doing wrong? Why col. works whereas select. doesn't when both of them are available from the same Class? Please help me if i am wrong?

2

There are 2 best solutions below

1
On BEST ANSWER

DataFrame.select you are trying to call has following signature:

def select(cols: Column*): DataFrame

As you can see it accepts a vararg of Column whereas you provide it a single, bare Column value which doesn't match the signature, thus the exception. Scala's varargs are wrapped in scala.collection.Seq. You can wrap your column(s) into something that implements Seq using following code:

(scala.collection.JavaConversions/asScalaBuffer [(Column. "C0")])
0
On

in Clojure, use Arrays to pass to varargs fields. I have the same issue was resolved when I called select function on dataframe using String and Array of String.

something like

(def cols-vec ["a","b","c])

(defn covert->spark-cols [columns] (into [] (map #(Column. %) columns)))

we gets fooled by the way the java api works when it comes to collection... when method signature says ... java is ok with one values where as Clojure expects a collection.