I am writing some files from relational database source to s3 using glue. I would like the s3 path to be in this format bucket_name/database/schema/table/year/month/day format. I am reading the bucket_name, database, schema, table name from a configuration file. I would like use those parameters read from configuration file to dynamically specify the s3 path where I am saving these source files. I am writing the source files to s3 using glue dynamic frame.
In the glue script I mention the path dynamically as : s3_target_path = 's3://' + target_bucket_name + '/' + database + '/' + schema + '/' + table + '/' + year '/' + month '/' + day
Glue's
DynamicFramesupports writing data with Hive-style partition names (key-value). See https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html#aws-glue-programming-etl-partitions-writing:This document says that you have to convert to a Spark
DataFrameif you want to apply an alternate partitioning scheme. I've never done this, but I have used an RDD like so:map()to add the output key (eg:xxx/yyy/yyyy/mm/dd)groupBy()with that key fieldforEach()with a function to write the output files.