I am working on a very huge dataset with 20 million+ records. I am trying to save all that data into a feathers format for faster access and also append as I proceed with me analysis.
Is there a way to append pandas dataframe to an existing feathers format file?
Feather files are intended to be written at once. Thus appending to them is not a supported use case.
Instead I would recommend to you for such a large dataset to write the data into individual Apache Parquet files using
pyarrow.parquet.write_table
orpandas.DataFrame.to_parquet
and read the data also back into Pandas usingpyarrow.parquet.ParquetDataset
orpandas.read_parquet
. These functions can treat a collection of Parquet files as a single dataset that is read at once into a single DataFrame.