Spark Scala DateType schema execution error

464 Views Asked by user2728349 At 17 December 2020 at 09:12

I get an execution error when I try to create a Schema for a dataframe in Spark Scala that says:

Exception in thread "main" java.lang.IllegalArgumentException: No support for Spark SQL type DateType
    at org.apache.kudu.spark.kudu.SparkUtil$.sparkTypeToKuduType(SparkUtil.scala:81)
    at org.apache.kudu.spark.kudu.SparkUtil$.org$apache$kudu$spark$kudu$SparkUtil$$createColumnSchema(SparkUtil.scala:134)
    at org.apache.kudu.spark.kudu.SparkUtil$$anonfun$kuduSchema$3.apply(SparkUtil.scala:120)
    at org.apache.kudu.spark.kudu.SparkUtil$$anonfun$kuduSchema$3.apply(SparkUtil.scala:119)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at org.apache.kudu.spark.kudu.SparkUtil$.kuduSchema(SparkUtil.scala:119)
    at org.apache.kudu.spark.kudu.KuduContext.createSchema(KuduContext.scala:234)
    at org.apache.kudu.spark.kudu.KuduContext.createTable(KuduContext.scala:210)

where the code is like:

val invoicesSchema = StructType(
    List(
        StructField("id", StringType, false),
        StructField("invoicenumber", StringType, false),
        StructField("invoicedate", DateType, true)
    ))

kuduContext.createTable("invoices", invoicesSchema, Seq("id","invoicenumber"), new CreateTableOptions().setNumReplicas(3).addHashPartitions(List("id").asJava, 6))

How can I use the DateType for this matter? StringType and FloatType don't have this same issue in the same code

Original Q&A

There are 1 best solutions below

thebluephantom On 17 December 2020 at 12:21

A work-around as I call it, with an example that you need to tailor, but gives you the gist of what you need to know I think:

import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, DateType}
import org.apache.spark.sql.functions._

val df = Seq( ("2018-01-01", "2018-01-31", 80)
            , ("2018-01-07","2018-01-10", 10)
            , ("2018-01-07","2018-01-31", 10)
            , ("2018-01-11","2018-01-31", 5)
            , ("2018-01-25","2018-01-27", 5)
            , ("2018-02-02","2018-02-23", 100)
            ).toDF("sd","ed","coins")

val schema = List(("sd", "date"), ("ed", "date"), ("coins", "integer"))
val newColumns = schema.map(c => col(c._1).cast(c._2))
val newDF = df.select(newColumns:_*)
newDF.show(false)
...
...

Spark Scala DateType schema execution error

There are 1 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-KUDU

Trending Questions

Popular # Hahtags

Popular Questions