I have like 200 pandas dataframe, and every dataframe has some unique column, or maybe completely different columns. example:
df1 = pd.DataFrame({
'Product': ['Apple', 'Banana', 'Orange', 'Mango'],
'Quantity': [10, 15, 12, 8],
'Price': [2.5, 1.5, 2, 3],
'Category': ['Fruit', 'Fruit', 'Fruit', 'Fruit']
})
df2 = pd.DataFrame({
'Student Name': ['John', 'Emma', 'Lisa', 'Tom'],
'Age': [18, 17, 19, 18],
'Grade': ['A', 'B', 'A', 'B'],
'City': ['New York', 'London', 'Paris', 'Sydney']
})
df3 = pd.DataFrame({
'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Company': ['AAPL', 'GOOG', 'AMZN', 'MSFT'],
'Price': [132.69, 1760.33, 3187.50, 215.41]
})
# and many more
while I thought that I can easily jump into Parquet and make a one folder, this turned out that it doesn't work that way if the Parquet files has different schemas (I haven't implemented it, so maybe I'm wrong too)
obviously I have read this post Storing multiple dataframes of different widths with Parquet?
so what are some of the formats that allow storing multiple dataframes in one file? other that excel
note: I'm trying to look into to_orc()
and orc
format, but I don't know if I can merge different schemas and cutoff NA
values.
note2: maybe it's not an answerable question, but you can help with sharing topics and links.
You can use HDF5. Install
pytables
first withpip install tables
Check: