Different ways to create ad-hoc analysis on top of S3

223 Views Asked by At

I have a data lake in AWS S3. The format of data is Parquet. Daily workload is ~70G. I want to build some ad-hoc analytics on top of that data. To do that I see 2 options:

  1. Use AWS Athena to request data via HiveQL to get data via AWS Glue (Data Catalog).
  2. Move data from S3 into Redshift as a data warehouse and query Redshift to perform ad-hoc analysis.

What is the best way to do ah-hoc analysis in my case? Is there more efficient way? And what are pros and cons of mentioned options?

PS

After 6 months I'm going to move data from S3 to Amazon Glacier, so that max data volume to query in S3/Redshift can be ~13T

0

There are 0 best solutions below