Use AWS Athena To Query S3 Object Tagging

3.1k Views Asked by At

Is it possible to use AWS Athena to query S3 Object Tagging? For example, if I have an S3 layout such as this

bucketName/typeFoo/object1.txt
bucketName/typeFoo/object2.txt
bucketName/typeFoo/object3.txt

bucketName/typeBar/object1.txt
bucketName/typeBar/object2.txt
bucketName/typeBar/object3.txt

And each object has an S3 Object Tag such as this

#For typeFoo/object1.txt and typeBar/object1.txt
id=A

#For typeFoo/object2.txt and typeBar/object2.txt
id=B

#For typeFoo/object3.txt and typeBar/object3.txt
id=C

Then is it possible to run an AWS Athena query to get any object with the associated tag such as this

select * from myAthenaTable where tag.id = 'A'
# returns typeFoo/object1.txt and typeBar/object1.txt

This is just an example and doesn't reflect my actual S3 bucket/object-prefix layout. Feel free to use any layout you wish in your answers/comments.

Ultimately I have a plethora of objects that could be in different buckets and folder paths but they are related to each other and my goal is to tag them so that I can query for a particular id value and get all objects related to that id. The id value would be a GUID and that GUID would map to many different types of objects that are related e.g., I could have a video file, a picture file, a meta-data file, and a json file and I want to get all of those files using their common id value; please feel free to offer suggestions too because I have the ability to structure this as I see fit.

Update - Note S3 Object Metadata and S3 Object Tagging are two different things.

1

There are 1 best solutions below

1
On

Athena does not support querying based on s3 tag

one workaround is, you can create a meta file which contains the tag and file mapping using lambda i.e whenever new file comes to s3 and lambda would update a file in s3 with tag and name details.