Loading around 50gb of parquet data to Redshift taking indefinite time to load

43 Views Asked by At

I am loading around 50 gb of Parquet data into Dataframe using Glue Etl job and then trying to load into Redshift table which is taking more 6-7 hrs and not even completing.

`datasink=glueContext.write_dynamic_frame.from_jdbc_conf(frame=<data_frame>, catalog_connection="redshift_connection", connection_options={ "preactions": pre_actions, "dbtable": dest_table, "database": "<redshift_database>", }, redshift_tmp_dir=args["TempDir"], transformation_ctx="datasink", )

Is there any performance improvement techniques one needs to follow ?

Tried partitioning the data and made significant changes with resource configuration.Using G2.x with 16 workers

0

There are 0 best solutions below