I am trying to minimize this function but using scipy minimize i expect to get in return an array of the same size of the one in input, that is (92445,) but i get an array of size (92445, 92445) and so i get the following error:
MemoryError: Unable to allocate 63.7 GiB for an array with shape (92445, 92445) and data type float64
I don't understant if i simply did not really understand how minimize works or if there is another problem that i don't see
X = np.asarray(freezer.dropna())
Xc = X
D = np.random.random((5,18489))
Z = np.random.random((5,5))
B = np.ones((5,5))
u = 0.5
us = np.sqrt(u)
l = 0.1
def P1(Di):
Di = Di.reshape(D.shape)
A = Z - np.matmul(Di,Xc) - B
return (np.linalg.norm(A, 'fro'))**2
def P2(Xi):
Xi = Xi.reshape(Xc.shape)
A1 = np.vstack(X, us*(Z-B))
A2 = np.vstack(np.identity(18489), us*D)
A3 = np.matmul(A2, Xi)
A4 = A1 - A3
return (np.linalg.norm(A4, 'fro'))**2
def P3(Z):
Z = Z.reshape(Z.shape)
A = Z - np.matmul(D, Xc) - B
return (np.linalg.norm(A, 'fro'))**2 + (l/u)*np.linalg.norm(Z, 1)
Df = D.flatten()
Df = minimize(P1, Df)
D = Df.reshape(D.shape)
Xcf = Xc.flatten()
Xcf = minimize(P2, Xcf)
Xc = Xcf.reshape(Xc.shape)
Zf = Z.flatten()
Zf = minimize(P3, Zf)
Z = Zf.reshape(Z.shape)
B = Z - np.matmul(D,Xc) - B
Nonlinear approach
For the case of P1, you can notice that the argument D only affects one row of A at a time. Therefore, you can optimize each row of D separately, and combine them at the end into a single D matrix. Solving a 18489 dimension problem five times is much easier than solving a 92445 dimension problem, so this will make it faster and use less memory.
I also found that it was necessary to use L-BFGS-B, a solver that uses less memory than what you are currently using, BFGS.
Code:
This is sufficient to solve this problem in under a gigabyte of memory, to get a Frobenius norm of ~10^-9. Since 0 is the best possible result, this is pretty close.
I think a similar approach could be used for P2 and P3, though I haven't worked through it in detail.
Linear approach
Another approach, which would be way faster, would be to reformat the problem into a linear regression.
Here is a short sketch of a proof.
Taking the square of the Frobenius norm is equivalent to the sum of squares of each element of A. Since
A = Z - np.matmul(Di,Xc) - B, this is equivalent to minimizing the square of the differencenp.matmul(Di,Xc) - (Z - B). Split this into two pieces. LetY = (Z - B). In order to do a standard linear regression, the parameters must be on the right, and in the expressionnp.matmul(Di,Xc), they are on the left. Fix this by transposing the inputs and swapping them to getnp.matmul(Xc.T, Di.T). We also need to transpose Y as well, soY = (Z - B).T. We are now trying to findnp.matmul(Xc.T, Di.T) ~ Y. This is now an ordinary least squares problem, and we can uselstsq(), which is much faster than minimize.Code:
Not only can this solve the problem in under a millisecond, it also finds a smaller result than the
minimize()solution.I think you could apply this to P2 and P3 as well. P3 looks to be equivalent to P1 except with the addition of an intercept to the linear regression. I don't quite understand P2, but since A2 and A3 are constant with respect to Xi, I think you could do something similar.