Where do i find ParquetDatasetPiece class?

152 Views Asked by At

Reading the petastorm/etl/dataset_metadata.py script I found this code

if row_groups_key != ".":
    for row_group in range(row_groups_per_file[row_groups_key]):
        rowgroups.append(pq.ParquetDatasetPiece(
            piece.path,
            open_file_func=dataset.fs.open, 
            row_group=row_group, 
            partition_keys=piece.partition_keys
        ))

where pq is defined like:

from pyarrow import parquet as pq

I've searched everywhere for the ParquetDatasetPiece class and can't find it. Somebody can tell me where is the ParquetDatasetPiece class?

1

There are 1 best solutions below

1
On BEST ANSWER

You can find it in the parquet part of the pyarrow codebase: https://github.com/apache/arrow/blob/951663a41c183c8fec5a4da9a8f9daf45ed85451/python/pyarrow/parquet/core.py#L1059-L1084

Note: it is being deprecated from pyarrow version 5.0.