Querying Parquet from S3 using Bloom filter

487 Views Asked by sancholp At 27 July 2025 at 14:22

I have some data in an s3 bucket in Parquet format. The data consists of various datasets containing a UUID key followed by values. I need to query individual UUIDs.

My question is whether it is possible to use the metadata provided by each Parquet file (specifically the Bloom filter), to see whether a specific UUID is (can be) located in each file, and then querying the file. The idea is not to query every single file in hopes of finding the required data, as this would take much too long.

Ideally, I would be going through each file in the bucket, obtaining the metadata, and seeing whether Parquet has hashed the requested UUID into a specific file. When I find a file containing the specific UUID, query it (e.g. with S3 Select).

Original Q&A

Querying Parquet from S3 using Bloom filter

There are 0 best solutions below

Related Questions in AMAZON-S3

Related Questions in PARQUET

Related Questions in AMAZON-S3-SELECT

Trending Questions

Popular # Hahtags

Popular Questions