h5py slow when reading through an s3fs file object

594 Views Asked by At

I am using the following combination of h5py and s3fs to read a couple of small datasets from larger HDF5 files on Amazon S3.

s3 = s3fs.S3FileSystem()
h5_file = h5py.File(s3.open(s3_path,'rb'), 'r')
data = h5_file.get(dataset)

These reads are relatively slow, and it seems like reading a single dataset this way is about as slow as copying over the entire file from the S3 bucket locally and then reading the dataset. I assume the reason is that there's a lot of overhead in the seek and read commands that h5py is sending via s3fs.

Does anyone have an idea for a more optimal approach? (apart from downloading the file and then reading it, which is faster if I want to read multiple datasets, but still far too slow)

Thanks!

Emmanuel

0

There are 0 best solutions below