fastparquet error when saving pandas df to parquet: AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement

2.7k Views Asked by Dulshan At 28 July 2025 at 01:34

import pandas as pd
from flatten_json import flatten

actual_column_list = ["_id", "external_id", "email", "created_at","updated_at", "dob.timestamp", "dob_1.timestamp","column_10"]

data = [{'_id': '60efe3333333445', 'external_id': 'ID2', 'dob': {'timestamp': 412214400}, 'email': '[email protected]', 'created_at': 1626334203, 'updated_at': 1629338900},
        { 'external_id': 'ID3', '_id': '60efe3333333487', 'email': '[email protected]', 'created_at': 1626334203, 'updated_at': 1629338900, 'dob_1': {'timestamp': 'oops'}}]

df = pd.DataFrame(data=[flatten(row, ".") for row in data], dtype='str', columns=actual_column_list)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df)

df.to_parquet(f"test.parquet", engine='fastparquet', compression="snappy", index=False)

ERROR Displayed:

root = parquet_thrift.SchemaElement(name=b'schema',
AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement'

Python Version : 3.6.9 pyarrow=5.0.0 fastparquet=0.8.0 numpy=1.19.5 pandas=1.1.5. Tried the exact code snippet with Python Version : 3.7.13 pyarrow=7.0.0 fastparquet=0.8.0 numpy=1.21.5 pandas=1.3.5 and it worked but need I need it to work with Python Version : 3.6.9 Tried to explicitly use these versions in python 3.6.9 but it failed to install the dependencies.

What I want is to make the above code snippet compatible with python 3.6.9

Original Q&A

There are 1 best solutions below

Dulshan On 03 April 2022 at 22:40 BEST ANSWER

Use fastparquet 0.7.2 Even though fastparquet 0.8.0 is compatible with python 3.6, looks like it requires a pyarrow version greater 5.0.0 to function properly. So had to downgrade fastparquet to 0.7.2 in order to be compatible with pyarrow 5.0.0

Note: This code snippet can be used to obtain all string columns parquet with columns having null datatype as well, without the columns being converted to float when its null which is the default behavior when pandas is used with pyarrow to save dataframe to parquet

fastparquet error when saving pandas df to parquet: AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement

There are 1 best solutions below

Related Questions in PANDAS

Related Questions in PYTHON-3.6

Related Questions in PARQUET

Related Questions in NULLABLE

Related Questions in FASTPARQUET

Trending Questions

Popular # Hahtags

Popular Questions