populating elements of dict using pandas.read_pickle() results in killed python process

91 Views Asked by At

On an Ubuntu 18.04.5 image running on AWS, I've noticed that attempting to populate a dict with multiple (7, in my case) dataframes loaded via pandas.read_pickle(), e.g., using something like

import pathlib
import pandas as pd
df_dict = {}
base_dir = pathlib.Path('some_path')
for i, f in base_dir.glob('*.pkl'):
    print(f)
    df = pd.read_pickle(f)

results in the python process being killed before all of the dataframes are loaded. What's odd is that if I read the files but don't assign them to elements of the dict, the loop completes successfully. Also, if I try loading the exact same dataframes stored in some other file format such as Feather (using pandas.read_feather()), the loop completes successfully. Any thoughts as to what could be going on? I'm using Python 3.8.8 and Pandas 1.2.4 installed via conda; I've also tried Python 3.8.15 with Pandas 1.5.2 and the same pickle files, but the result is the same. When I try to replicate the problem using the same Python and Pandas versions on MacOS 12.6.2 (also installed via conda), the problem does not occur.

0

There are 0 best solutions below