Can get correct statistics from fastparquet

45 Views Asked by At

I am getting None statistics (min / max) when reading file from S3 using fastparquet. When calling

fp.ParquetFile(fn=path, open_with=myopen).statistics['min']

Most of the values are None, and some of the values are valid.

However, when I read the same file with other framework, I am able to get the correct min/max for all values.

How can I get all the statistics? Thanks

1

There are 1 best solutions below

0
On

The full set of row groups are available as the list

pf = fp.ParquetFile(fn=path, open_with=myopen)
pf.row_groups

and each row group has a .columns attribute, which in turn have meta_data; so you can dig around to see what the individual min/max of the columns are.