Access Pages, PageHeaders and encodings of a parquet file

96 Views Asked by At

I'm having trouble accessing metadata at the Page level of a parquet file.

I'm able to access metadata and encodings at the ColumnChunk level only.

I'm trying to achieve that because encodings at ColumnChunk level often has this value in my files : [RLE, RLE_DICTIONARY, PLAIN] which seems odd to me. I did read that the Dictionary Page is encoded as plain, which makes sense. And that the dictionary can get full, which means some data pages are using the dictionary and some don't. The first would be encoded as RLE_DICTIONARY and the second as RLE ?

With pyarrow, I'm able to print metadata at columnChunk level, including the encodings. But it does not seem to be possible to access Page and PageHeaders.

With the parquet-mr java library, I was able to do the same thing. But I cannot find objects and functions to reach PageHeaders and encoding at Page level. Does anyone know how to do it ?

0

There are 0 best solutions below