I have a .h5 file with about 2mil 256x256 images. The data doesn't fit into memory, which is why I am using a generator. I am wondering if I am iterating over the .h5 file in the correct way (currently using the h5py package).
(The reason why I am wondering is because the model is training very slowly (680ms/step). But this could also be because of other reasons.)
The code for the generator:
class Generator:
def __init__(self, file_path):
#self.data = h5py.File(file_path, 'r')
self.file_path = file_path
def __call__(self):
with h5py.File(self.file_path, 'r') as data:
for key in data.keys():
obj = data[key]
#X = np.array(obj['X'][()])
Y = np.array(obj['Y'][()]) * normalization_factor
Y = Y.reshape(256,256,1)
yield (Y, Y)
The code for creating the tf.data.Dataset.from_generator()
def dataset(path_to_data, batch_size):
return tf.data.Dataset.from_generator(
Generator(path_to_data),
output_signature = (
tf.TensorSpec(shape=(256, 256, 1), dtype=tf.float32),
tf.TensorSpec(shape=(256, 256, 1), dtype=tf.float32),
)
).batch(batch_size)