AWS Glue Data Target Upsert generates an Exception

1.3k Views Asked by At

I'm using AWS Glue to load data into a Redshift database using Glue Studio.

If the Data Target is Insert Only the data gets inserted without any problem, this is the code generated:

# Script generated for node Amazon Redshift
AmazonRedshift_node = glueContext.write_dynamic_frame.from_catalog(
    frame=SelectFields_node2,
    database="redshift_mast_code",
    table_name="dev_mcd_rs_iot_mast_code",
    redshift_tmp_dir="s3://glue-temp-dir-dev/",
    additional_options={
        "aws_iam_role": "arn:aws:iam::...myRole"
    },
    transformation_ctx="AmazonRedshift_node",
)

However, when I use the Upsert (Update and Insert) option, I get an exception.

2022-04-03 15:19:09,674 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
  File "/tmp/glue-scripts-tmp", line 129, in <module>
    transformation_ctx="AmazonRedshift_node",
TypeError: from_jdbc_conf() got an unexpected keyword argument 'additional_options'

And this is the code generated:

# Script generated for node Amazon Redshift
pre_query = "drop table if exists mcd_rs_iot.stage_table_941d406a69c8480aa44ed085a2adeb40;create table mcd_rs_iot.stage_table_941d406a69c8480aa44ed085a2adeb40 as select * from mcd_rs_iot.mast_code where 1=2;"
post_query = "begin;delete from mcd_rs_iot.mast_code using mcd_rs_iot.stage_table_941d406a69c8480aa44ed085a2adeb40 where mcd_rs_iot.stage_table_941d406a69c8480aa44ed085a2adeb40.cd = mcd_rs_iot.mast_code.cd; insert into mcd_rs_iot.mast_code select * from mcd_rs_iot.stage_table_941d406a69c8480aa44ed085a2adeb40; drop table mcd_rs_iot.stage_table_941d406a69c8480aa44ed085a2adeb40; end;"
AmazonRedshift_node = glueContext.write_dynamic_frame.from_jdbc_conf(
    frame=SelectFields_node2,
    catalog_connection="Redshift-Connection",
    connection_options={
        "database": "dev",
        "dbtable": "mcd_rs_iot.stage_table_941d406a69c8480aa44ed085a2adeb40",
        "preactions": pre_query,
        "postactions": post_query,
    },
    redshift_tmp_dir="s3://glue-temp-dir-dev/",
    additional_options={
        "aws_iam_role": "arn:aws:iam::...myRole"
    },
    transformation_ctx="AmazonRedshift_node",

How can I use the Upsert option in Glue Studio?

Glue version: 3.0 Supports Spark 3.1 Scala 2, Python 3

enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

You are setting additional_options that are not allowed in from_jdbc_conf function: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame-writer.html

Just delete it from config.