How to add date to a file's name using UNLOAD in Redshift

2.9k Views Asked by At

I found 2 solutions:

  1. Using AWS Data Pipeline to schedule the query (Unload) and use 's3://reporting-team-bucket/importfiles/test_123-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.csv'
  2. writing an MV command to rename the file on the s3 bucket

Is there a way to give a file's the current date by only using Redshift, with no other services?

Here is my code so far:

unload
(
'select * from table'
)
to 's3://bucket/unload_test/test_123_{CurrentDate}.gz'
ACCESS_KEY_ID '12345678910'
SECRET_ACCESS_KEY '10987654321'
GZIP
PARALLEL off; 

Just need to get CurrentDate to be 202106 for example.

Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

I've never tried using UNLOAD in a transaction, but if it works, you could use a procedure.

0
On

Redshift unload gives an option to load the data in a by partition. Use **PARTITION BY(COLUMN_NAME)**. Here is an example

unload (' 

        SELECT   col1
               , col2
               , col3
               , current_date as partition_by_me
         FROM dummy
         
 '
 )
to 's3://mybucket/dummy/'
partition by(partition_by_me)
iam_role 'arn of IAM role'
kms_key_id 'arn of kms key'
encrypted
FORMAT AS PARQUET

In the above example, Added a dummy column partition_by_me as current_date. Used that in the unload command partition by(parition_by_me). Data in S3 lands in that specific partition.

S3 path would be : s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet Timestamp with zone does work with this.

*** Dummy column does not get exported to S3 file as an additional column, unless you want to include. Following clause need to used to include in the unloaded data set.

partition by(partition_by_me) INCLUDE

INCLUDE clause will include the column in exported data sets.