Gradient descent stuck in local minima?

253 Views Asked by At

I'm running gradient descent to find a root for a system of nonlinear equations and I am wondering how you might detect if the method is stuck at the local minima, because I believe with the settings I am using this might be the case? my initial values are [-2, -1], tolerance of 10^-2 and 20 iterations. One thing I had read upon was that if the residual begins to flat line or begins to decrease incredibly slowly, it could be an indicator of the method being stuck in the local minima though, I am not entirely sure. I have graphed my residual with its iteration as the values of my iterates for each iteration and I'm wondering how I might know if it's stuck at the local minima.

def system(x):
    F = np.zeros((2,1), dtype=np.float64)
    F[0] = x[0]*x[0] + 2*x[1]*x[1] + math.sin(2*x[0])
    F[1] = x[0]*x[0] + math.cos(x[0]+5*x[1]) - 1.2
    return F
 
def jacb(x):
    J = np.zeros((2,2), dtype=np.float64)
    J[0,0] = 2*(x[0]+math.cos(2*x[0]))
    J[0,1] = 4*x[1]
    J[1,0] = 2*x[0]-math.sin(x[0]+5*x[1])
    J[1,1] = -5*math.sin(x[0]+5*x[1])
    return J


iterates, residuals = GradientDescent('system', 'jacb', np.array([[-2],[-1]]), 1e-2, 20, 0);

FullGradientDescent.py GradientDescentWithMomentum

I'm testing usually with 20 iterations but I did 200 to illustrate the slowing down of the residual enter image description here

enter image description here

Marat suggested using GD with momentum. Code changes:

dn = 0
gamma = 0.8
dn_prev = 0
while (norm(F,2) > tol and n <= max_iterations):
    J = eval(jac)(x,2,fnon,F,*fnonargs)
    residuals.append(norm(F,2))
    
    dn = gamma * dn_prev+2*(np.matmul(np.transpose(J), F))
    dn_prev = dn

    lamb = 0.01       
    x = x - lamb * dn

Residual using GD with momentum enter image description here

lastchance suggested doing a contour plot, this seems to show the behaviour of the algorithm but it still does not converge? enter image description here

2

There are 2 best solutions below

0
On

Rewrite your system as follow:

import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt

def system(x):
    return np.array([
        x[0]*x[0] + 2*x[1]*x[1] + np.sin(2*x[0]),
        x[0]*x[0] + np.cos(x[0] + 5*x[1]) - 1.2
    ])

Create some room to plot contour around minima:

xlin = np.linspace(-2, 2, 200)
ylin = np.linspace(-2, 2, 200)
X, Y = np.meshgrid(xlin, ylin)
Z = system([X, Y])

Solve the system around potential solution you have identified:

solutions = np.array([
    optimize.fsolve(system, x0=[0, 0]),
    optimize.fsolve(system, x0=[-1, 0]),
    optimize.fsolve(system, x0=[-1, 0.5])
])

# array([[ 0.        ,  0.        ],
#       [-0.96352546, -0.06643926],
#       [-0.84056293,  0.37906057]])

Confirm returned points actually are solutions:

system(solutions.T).T

# array([[ 0.00000000e+00, -2.00000000e-01],   # Not converged
#       [ 8.88178420e-16,  6.66133815e-16],
#       [-2.22044605e-15, -4.88498131e-15]])

Plot the whole thing to confirm solutions:

fig, axe = plt.subplots()
axe.contour(X, Y, np.log10(np.sqrt(Z[0]**2 + Z[1]**2)), 30, cmap="jet")
axe.plot(*solutions.T, linestyle="none", marker="o")
axe.grid()

enter image description here

The first point (0,0) is not a solution, solver fails:

optimize.fsolve(system, x0=[0, 0], full_output=True)
# (array([0., 0.]),
# {'nfev': 15,
#  'fjac': array([[-1.00000000e+00, -2.54340978e-08],
#         [ 2.54340978e-08, -1.00000000e+00]]),
#  'r': array([-3.12495738e+12,  2.10079410e+06, -5.44853621e-02]),
#  'qtf': array([5.08681955e-09, 2.00000000e-01]),
#  'fvec': array([ 0. , -0.2])},
# 5,
# 'The iteration is not making good progress, as measured by the improvement from the last ten iterations.')

Based on documentation:

The solution (or the result of the last iteration for an unsuccessful call).

Based on those information, you can confirm your code did find one of the two roots of your system over the domains, the additional critical point at (0,0) is not solution.

14
On

Your two equations can be written as curves y(x):

y=+-sqrt( (-sin(2x)-x^2) / 2 )

y = (arccos(1.2-x^2)-x)/5

These are the blue and red lines, respectively, on the graph below. Note that both have two branches. The points of intersection are the two roots. enter image description here The roots can be found by multi-dimensional Newton-Raphson:

import numpy as np
from math import sin, cos

#=======================================================================

def solver( eqns, x ):
    MAXITER = 1000
    TOL = 1.0e-10
    iter, residuals = 1, 1.0
    while iter <= MAXITER:
        print( x )
        F = Function( eqns, x )
        residuals = np.linalg.norm( F )
        if residuals <= TOL: break
        J = Jacobian( eqns, x )
        try: 
            deltax = np.linalg.solve( J, F )
        except np.LinAlgError:
            print( "Unable to solve" )
            break
        x = x - deltax
        iter += 1
    return x, residuals <= TOL

#=======================================================================

def Function( eqns, x ):
    F = np.zeros_like( x )
    for i, f in enumerate( eqns ): F[i] = f( x )
    return F

#=======================================================================

def Jacobian( eqns, x ):
    SMALL = 1.0e-6
    N = len( x )
    J = np.zeros( ( N, N ) )
    for i in range( N ):
       for j in range( N ):
           xplus  = x.copy();   xplus [j] = x[j] + 0.5 * SMALL
           xminus = x.copy();   xminus[j] = x[j] - 0.5 * SMALL
           J[i,j] = ( eqns[i]( xplus ) - eqns[i]( xminus ) ) / SMALL
    return J

#=======================================================================

eqns = [ lambda x: x[0] ** 2 + 2 * x[1] ** 2 + sin( 2 * x[0] ),
         lambda x: x[0] ** 2 + cos( x[0] + 5 * x[1] ) - 1.2   ]
x = np.array( [-0.8, 0.5 ] )
#x = np.array( [-0.8, -0.1 ] )          # alternative root
x, result = solver( eqns, x )
if result:
    print( "Root: ", x )
else:
    print( "No root found" )

#=======================================================================

For the first starting point you get:

[-0.8  0.5]
[-0.85082789  0.38764034]
[-0.8407059   0.37917738]
[-0.84056296  0.37906059]
[-0.84056293  0.37906057]
Root:  [-0.84056293  0.37906057]

For the second starting point you get:

[-0.8 -0.1]
[-1.01262867 -0.06737605]
[-0.96586428 -0.0665185 ]
[-0.96353139 -0.06643953]
[-0.96352546 -0.06643926]
[-0.96352546 -0.06643926]
Root:  [-0.96352546 -0.06643926]