How to compare multiple hdf5 files

2.5k Views Asked by At

I have multiple h5py files(pixel-level annotations) for one image. Image Masks are stored in hdf5 files as key-value pairs with the key being the id of some class. The masks (hdf5 files) all match the dimension of their corresponding image and represent labels for pixels in the image. I need to compare all the h5 files with one another and find out the final mask that represents the majority. But I don't know how to compare multiple h5 files in python. Can someone kindly help?

1

There are 1 best solutions below

2
kcw78 On BEST ANSWER

What do you mean by "compare"?

If you just want to compare the files to see if they are the same, you can use the h5diff utility from The HDF5 Group. It comes with the HDF5 installer. You can get more info about h5diff here: h5diff utility. Links to all HDF5 utilities are at the top of the page:HDF5 Tools

It sounds like you need to do more that that. Please clarify what you mean by "find out the final mask that represents the majority". Do you want to find the average image size (either mean, median, or mode)? If so, it is "relatively straight-forward" (if you know Python) to open each file and get the dimension of the image data (the shape of each dataset -- what you call the values). For reference, the key, value terminology is how h5py refers to HDF5 dataset names and datasets.

Here is a basic outline of the process to open 1 HDF5 file and loop thru the datasets (by key name) to get the dataset shape (image size). For multiple files, you can add a for loop using the iglob iterator to get the HDF5 file names. For simplicity, I saved the shape values to 3 lists and manually calculated the mean (sum()/len()). If you want to calculate the mask differently, I suggest using NumPy arrays. It has mean and median functions built-in. For mode, you need scipy.stats module (it works on NumPy arrays).

Method 1: iterates on .keys()

s0_list = []
s1_list = []
s2_list = []    
with h5py.File(filename,'r')as h5f:
    for name in h5f.keys() :
        shape = h5f[name].shape
        s0_list.append(shape[0])
        s1_list.append(shape[1])
        s2_list.append(shape[2])
    
print ('Ave len axis=0:',sum(s0_list)/len(s0_list))
print ('Ave len axis=1:',sum(s1_list)/len(s1_list))
print ('Ave len axis=2:',sum(s2_list)/len(s2_list))

Method 2: iterates on .items()

s0_list = []
s1_list = []
s2_list = []    
with h5py.File(filename,'r')as h5f:
    for name, ds in h5f.items() :
        shape = ds.shape
        s0_list.append(shape[0])
        s1_list.append(shape[1])
        s2_list.append(shape[2])
    
print ('Ave len axis=0:',sum(s0_list)/len(s0_list))
print ('Ave len axis=1:',sum(s1_list)/len(s1_list))
print ('Ave len axis=2:',sum(s2_list)/len(s2_list))