Why do quasi newton methods such as DFP and BFGS have poor performance on ill-conditionned problem, even if quadratic

999 Views Asked by At

I've been reading in the litterature that quasi newton methods such as DFP and BFGS have poor performance on ill-conditionned problems, but I don't understand the reason of it. I have been trying to use these methods on a quadratic problem that is ill-conditionned and it doesn't converge in p+1 iterations (it is one of quasi newton methods properties for quadratic problems), but a little more. Why is that ? Thank you for your help.

1

There are 1 best solutions below

2
On

Ill-conditioning is a general problem for first-order optimization algorithms. Basically there are two main aspects with ill-conditioning:

  1. it leads to the numerical instability (e.g., roundoff errors) that are accumulated by the algorithm
  2. it slows down the convergence rate due to the stretched shape of the resulting contours of the Hessian

Standard Newton method, which is a second-order optimization algorithm, in theory, can tackle both of these problems: first, it converges much faster, hence, there is a much lower roundoff error, and second, it does not have problems with the stretched shape since now it takes into account the gradient scaling to move in a proper direction.

However, solving it involves the operation of the inverse of the Hessian, which, in the case of large condition numbers, may result in the corresponding small eigenvalues blowing up, leading to numerical instability when implemented in a naive way. This issue can still be resolved by utilizing a proper direct solver or iterative solver with pre-conditioning

Finally, quasi-Newton methods such as DFP and BFGS try to approximate the inverse Hessian iteratively. They may be more robust in handling roundoff errors and also may be faster in convergence than first-order methods, but because of this approximation, they may end up performing worse than the standard Newton method.