How to use a ndarray of stored ndarrays with memmap as a big ndarray tensor

137 Views Asked by At

I recently started to use numpy memmap to link an array in my project since I have a 3 dimensions tensor for a total of 133 billions values for a graph of the dataset I am using as example.

I am trying to calculate the heat kernel signature of a 5748 nodes graph (21st of DD dataset). My code to calculate the projectors (where I use memmap) is:

Path('D:/hks_temp').mkdir(parents=True, exist_ok=True)
for l, ll in enumerate(L):
    pl = np.zeros((n, n))
    for k in ll:
        pl += np.outer(evecs[:, k], evecs[:, k])
    fp = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='w+', shape=(n, n))
    fp[:] = pl[:]
    fp.flush()

inside all the X_hks.npy there is a n by n ndarray (from the example 5748 * 5748).

Then I want all these computed arrays to form the 3 dimension tensor so I "link" (I don't know if it's the right term) them in this way:

P = np.array([None] * len(L))    # len(L) = 4043
for l in range(len(L)):
    P[l] = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='r', shape=(n, n))

P is used later only to do inside a cycle H = np.einsum('ijk,i->jk', P, np.exp(-unique_eval * t)).

However, that raises an error: ValueError: einstein sum subscripts string contains too many subscripts for operand 0. Since the method is correct for smaller graphs that doesn't require memmap, my thought was that P isn't well structured for numpy and I must arrange the data, maybe doing a reshape. So I tried to do a P.reshape(len(L), n, n) but it doesn't work giving ValueError: cannot reshape array of size 4043 into shape (4043,5748,5748). How can I make it work?

I already found this question but it doesn't fit this case. I think I can't store all inside one big object since it did 497GB of memmap files (126MB each). If I can do it, please tell me.

If it is impossible to do it I will reduce the use case, however I am quite interested to make it work for all the possibilities.

0

There are 0 best solutions below