I am new to using HDF5 files and I am trying to read files with shapes of (20670, 224, 224, 3). Whenever I try to store the results from the hdf5 into a list or another data structure, it takes either takes so long that I abort the execution or it crashes my computer. I need to be able to read 3 sets of hdf5 files, use their data, manipulate it, use it to train a CNN model and make predictions.
Any help for reading and using these large HDF5 files would be greatly appreciated.
Currently this is how I am reading the hdf5 file:
db = h5py.File(os.getcwd() + "/Results/Training_Dataset.hdf5")
training_db = list(db['data'])
Crashes probably mean you are running out of memory. Like Vignesh Pillay suggested, I would try chunking the data and work on a small piece of it at a time. If you are using the pandas method read_hdf you can use the iterator and chunksize parameters to control the chunking:
Note this requires the hdf to be in table format