I'm running gradient descent to find a root for a system of nonlinear equations and I am wondering how you might detect if the method is stuck at the local minima, because I believe with the settings I am using this might be the case? my initial values are [-2, -1], tolerance of 10^-2 and 20 iterations. One thing I had read upon was that if the residual begins to flat line or begins to decrease incredibly slowly, it could be an indicator of the method being stuck in the local minima though, I am not entirely sure. I have graphed my residual with its iteration as the values of my iterates for each iteration and I'm wondering how I might know if it's stuck at the local minima.
def system(x):
F = np.zeros((2,1), dtype=np.float64)
F[0] = x[0]*x[0] + 2*x[1]*x[1] + math.sin(2*x[0])
F[1] = x[0]*x[0] + math.cos(x[0]+5*x[1]) - 1.2
return F
def jacb(x):
J = np.zeros((2,2), dtype=np.float64)
J[0,0] = 2*(x[0]+math.cos(2*x[0]))
J[0,1] = 4*x[1]
J[1,0] = 2*x[0]-math.sin(x[0]+5*x[1])
J[1,1] = -5*math.sin(x[0]+5*x[1])
return J
iterates, residuals = GradientDescent('system', 'jacb', np.array([[-2],[-1]]), 1e-2, 20, 0);
FullGradientDescent.py GradientDescentWithMomentum
I'm testing usually with 20 iterations but I did 200 to illustrate the slowing down of the residual

Marat suggested using GD with momentum. Code changes:
dn = 0
gamma = 0.8
dn_prev = 0
while (norm(F,2) > tol and n <= max_iterations):
J = eval(jac)(x,2,fnon,F,*fnonargs)
residuals.append(norm(F,2))
dn = gamma * dn_prev+2*(np.matmul(np.transpose(J), F))
dn_prev = dn
lamb = 0.01
x = x - lamb * dn
Residual using GD with momentum

lastchance suggested doing a contour plot, this seems to show the behaviour of the algorithm but it still does not converge?



Your two equations can be written as curves y(x):
y=+-sqrt( (-sin(2x)-x^2) / 2 )
y = (arccos(1.2-x^2)-x)/5
These are the blue and red lines, respectively, on the graph below. Note that both have two branches. The points of intersection are the two roots.
The roots can be found by multi-dimensional Newton-Raphson:
For the first starting point you get:
For the second starting point you get: