Insert into SQL Server table selected columns from spark dataframe

3.3k Views Asked by AJR At 07 June 2025 at 09:11

I have a SQL Server table that has a different schema than my dataframe. I would like to select some columns from my dataframe and "insert into" the table the values I selected.

Basically something similar to the code below but in pyspark:

INSERT INTO Cust_Diff_Schema_tbl
(acct_num, name)
SELECT account_no, name
FROM customers
WHERE customer_id > 5000;

I can read the data using jdbc using spark.read. Just like below:

df_s3 = spark.read.format("jdbc")\
                .option("driver", db_driver_name)\
                .option("url", db_url+ ":1433;databaseName="+stage_db)\
                .option("dbtable", tbl_name)\
                .option("query", """(select * from customers)""")\
                .option("user", db_username)\
                .option("password", db_password)\
                .load()
    
    df_s3.printSchema()
    df_s3.show(20)

To write/append the data to the table with the selected values, I believe I can still use "df_s3.write" but I need an example on how to use the insert statement using ".option" function or another approach if this does not work.

Thanks in advance.

Original Q&A

There are 1 best solutions below

vaquar khan On 21 October 2020 at 16:47 BEST ANSWER

//create dataframe

val df = //fetch from  db,read file or other options

df.write.format("jdbc")
      .option("numPartitions", 20)
      .option("batchsize", 10000)
      .option("truncate", "true")
      .option("url", "jdbcURL")
      .option("driver", "Driver name")
      .option("dbtable", "tablename")
      .mode("append")
      .save()

Insert into SQL Server table selected columns from spark dataframe

There are 1 best solutions below

Related Questions in JDBC

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in AWS-GLUE-SPARK

Trending Questions

Popular # Hahtags

Popular Questions