Here is a simple extraction of what I intend to do
loop1 = range(10)
loop2 = range(10)
loop3 = range(100)
list = []
for l in loop1:
for n in loop2:
for m in loop3:
list.append([l,n,m])
dSet = []
for l in list:
matrix = np.ones((600,600))
matrix = l[2]*matrix
dSet.append(matrix)
since there will be 10 thousand 600*600 matrix, the dSet cannot hold that much of data and cause the memory leak every time. So I would like to use h5py(hdf5) to store dSet and flush into disk for every 100 for loop, is there any decent solution?
Thank so much
Sure you can do this but it depends what you want to do:
Do you want to store each of the 10 thousands 600x600 matrices in its own dataset or do you want to have a huge matrix (6000000x600) ?
In the first case you create for each matrix it's own datset with
dset = f.create_dataset("init", data=myData)
In the second case you have to loop over and write the data in chunks after you created the dataset. Something along these lines:
This only works if you know the total size in advance. If you don't you can use extendable datasets (see here for more details)