How to dynamically specify s3 path using glue?

1.8k Views Asked by RJ7 At 28 October 2021 at 11:09

I am writing some files from relational database source to s3 using glue. I would like the s3 path to be in this format bucket_name/database/schema/table/year/month/day format. I am reading the bucket_name, database, schema, table name from a configuration file. I would like use those parameters read from configuration file to dynamically specify the s3 path where I am saving these source files. I am writing the source files to s3 using glue dynamic frame.

In the glue script I mention the path dynamically as : s3_target_path = 's3://' + target_bucket_name + '/' + database + '/' + schema + '/' + table + '/' + year '/' + month '/' + day

Original Q&A

There are 1 best solutions below

Parsifal On 28 October 2021 at 11:25

Glue's DynamicFrame supports writing data with Hive-style partition names (key-value). See https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html#aws-glue-programming-etl-partitions-writing:

connection_options = {"path": "$outpath", "partitionKeys": ["type"]},

This document says that you have to convert to a Spark DataFrame if you want to apply an alternate partitioning scheme. I've never done this, but I have used an RDD like so:

Use map() to add the output key (eg: xxx/yyy/yyyy/mm/dd)
Use groupBy() with that key field
Use forEach() with a function to write the output files.

How to dynamically specify s3 path using glue?

There are 1 best solutions below

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-S3

Related Questions in AWS-GLUE-SPARK

Related Questions in AWS-GLUE-WORKFLOW

Trending Questions

Popular # Hahtags

Popular Questions