Location of S3 data for Amazon Athena

5.1k Views Asked by At

I have created an Amazon S3 bucket and uploaded a flat file (the famous Iris flower data set data as csv).

I would now like to create an Iris dataset flat table in Amazon Athena and query it. I just cannot find the 'Location of Input Data Set'.

How do I determine the location of my flat Iris file in the S3 bucket? Is there possibly a tutorial for the above situation (google did not help much yet)?

2

There are 2 best solutions below

2
On BEST ANSWER

As per the Amazon Athena CREATE TABLE documentation, the syntax to create a table is:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS]
 [db_name.]table_name [(col_name data_type [COMMENT col_comment] [, ...] )]
 [COMMENT table_comment]
 [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
 [ROW FORMAT row_format]
 [STORED AS file_format] [WITH SERDEPROPERTIES (...)] ]
 [LOCATION 's3_loc']
 [TBLPROPERTIES ( ['has_encrypted_data'='true | false',] ['classification'='aws_glue_classification',] property_name=property_value [, ...] ) ]

The s3_loc is:

Specifies the location of the underlying data in Amazon S3 from which the table is created, for example, s3://mystorage/. For more information about considerations such as data format and permissions, see Create Tables From Underlying Data in Amazon S3.

Use a trailing slash for your folder or bucket. Do not use file names or glob characters.

Use: s3://mybucket/myfolder/

Don't use: s3://path_to_bucket s3://path_to_bucket/* s3://path_to-bucket/mydatafile.dat

Therefore, if you have stored your flat file in a bucket called my-bucket within a directory called iris, you would use:

LOCATION s3://my-bucket/iris/

Note that you point to the directory, not the file. This is because many data sets are stored as multiple files (and even multiple sub-directories).

1
On

If you have AWS CLI installed, then you can use it to find the file:

aws s3 ls s3://bucket_name --recursive | grep iris_csv_file