How to read all parquet files from S3 using awswrangler in python

9.6k Views Asked by At

Need read all parquet files with ext .parquet

s3_path = "s3://buckte/table/files.parquet"

df = wr.s3.read_parquet(
    path=[s3_path]
)

, but still a error :

Error occurred (404) when calling the HeadObject
2

There are 2 best solutions below

0
On

The trick is to put only one string as s3 path and path_sufix

s3_path = "s3://buckte/table"

df = wr.s3.read_parquet(
    path=s3_path,
    path_suffix = ".snappy.parquet" ,
    use_threads =True
)
0
On

You are getting this error because the file you are trying to search is not found, or the location that you are trying to read from doesn't exist.

You can either specify the exact (and correct) location of the file you want to access. Or if you want to read all the parquet files from a folder, you can just specify the name of the folder, while specifying the extensions (".parquet", ".csv", ".json" etc.) through the suffix property.

The following code helps to read all parquet files within the folder 'table'.

df = wr.s3.read_parquet(
    path = "s3://bucket/table/",
    path_suffix = ".parquet"
)

If you want to read all the parquet files within your bucket, the following code helps

df = wr.s3.read_parquet(
    path = "s3://bucket/",
    path_suffix = ".parquet"
)