How to write "all string" dataframe to Spark JDBC in Append mode to a target table with int and varchar columns

520 Views Asked by At

I create spark dataframe from csv file and try to insert it to rdbms table having integer and varchar columns. Since my dataframe is all string type its failing in "append" mode. If i use overwrite mode, rdbms table will be recreated with all varchar columns. How can i insert data into rdbms table in append mode by handling dataype in the spark dataframe?

1

There are 1 best solutions below

1
On BEST ANSWER

While reading the CSV you can either inferSchema or specify the schema programmatically.

val diamonds = sqlContext.read.format("csv")
      .option("delimiter"," ").option("quote","")
      .option("header", "true")
      .option("inferSchema", "true")
      .load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv");

import org.apache.spark.sql.types._

val customSchema = new StructType()
  .add("_c0",IntegerType,true)
  .add("carat",DoubleType,true)
  .add("cut",StringType,true)
  .add("color",StringType,true)
  .add("clarity",StringType,true)
  .add("depth",DoubleType,true)
  .add("table",DoubleType,true)
  .add("price",IntegerType,true)
  .add("x",DoubleType,true)
  .add("y",DoubleType,true)
  .add("z",DoubleType,true)
    
    val diamonds_with_schema = spark.read.format("csv")
    .option("header", "true")
    .schema(customSchema)
    .load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv");

once you have this dataframe with the required schema then you can append to the existing table.

Please check: https://docs.databricks.com/data/data-sources/read-csv.html