Is it possible to update and insert data in AWS Glue database using glue

5.6k Views Asked by At

So I am using AWS pyspark, and have gigabytes of data everyday, which is getting updated. I want to find the id of the data in an existing table in glue database, update if the id already exists and insert if the id does not exist.

Is it possible to do it in AWS glue?

Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

Yes, you can use the Glue Pyspark Extension for this.

data_sink = glue_context.getSink(
                    path="s3_path",
                    connection_type="s3",
                    updateBehavior="UPDATE_IN_DATABASE",
                    partitionKeys=['partition_column'],
                    compression="snappy",
                    enableUpdateCatalog=True,
                )
data_sink.setCatalogInfo(
                catalogDatabase=database_name,
                catalogTableName=table_name,
                )
data_sink.setFormat("glueparquet")
data_sink.writeFrame(data_frame)
0
On

You can use Athena queries in the glue job to implement your logic. https://docs.aws.amazon.com/athena/latest/ug/querying-athena-tables.html