How to optimize objective function that contains the “k-biggest” operator?

480 Views Asked by At

I want to solve the following optimization problem:

min_{x,y} f(xA+yB)

s.t: {x,y} >= 0 , (x+y)=1

in which, {A,B} are square matrices and {x,y} are scalars.
The function f(A)=sum(s_i), and s_i means sum of biggest k entries in the ith row of A.

For example if

 A= [1  2  3
     5  1  7
     3  4  2]

then f(A) for k=2 would be

(2+3)+(5+7)+(4+3)=24

If i could calculate the derivatives of f respect to (x,y) then the problem would be solved easily.

Also i know that one way to deal with "max(a)" in an objective is to add a stack variable t to the objective and adding a<=t to the constraints. But i couldn't find the above problem that straightforward.

In order to compare the optimality of the solution, I solve the above problem using the Matlab general solver, but i need to know how can i optimize it myself.

1

There are 1 best solutions below

2
On BEST ANSWER

(The question is somewhat unclear. I don't know if you want to learn how to formulize this as convex-optimization problem (homework?) or just solve it somehow. You did not provide your working approach and much more is missing. Please keep that in mind for your next question)

Convex-optimization with powerful modelling-languages

cvxpy (python) (and probably cvxopt (matlab) and convex.jl (julia)) support this kind of constraint. So the whole problem would look like:

Code:

from cvxpy import *
import numpy as np
np.random.seed(1)

# Constants
A = np.random.normal(size=(5,5))
B = np.random.normal(size=(5,5))
k = 2

# Vars
x = Variable(2)

# Constraints
constraints = [sum_entries(x) == 1,
               x >= 0.0]

# Problem
func_inner = x[0] * A + x[1] * B
func = sum([sum_largest(func_inner[i, :], 2) for i in range(A.shape[1])])
objective = Minimize(func)
problem = Problem(objective, constraints)
problem.solve(verbose=True)

These tools also proov convexity by construction, which is very important for other approaches we might use later (e.g. Projected Gradient-descent). This is a convex optimization problem!

Convex-optimization by hand

Let's focus on the formulation of sum_largest(x, k), x being a vector of variables, k a positive constant.

This can be formulated introducing some aux-vars. Let's fix x and re-formulate:

sum_largest([1, 0, 3], 2):

x = [1, 0, 3]         # input
q = Variable(1)       # aux-var: single-dim var
t = Variable(vec_len) # aux-vars: dim = 3

Min sum(t) + k * q

st. x[0] <= t[0] + q,
    x[1] <= t[1] + q,
    x[2] <= t[2] + q,
    t[0] >= 0,
    t[1] >= 0,
    t[2] >= 0

This reformulation was taken from cvxpy's sources!

Test (not the nicest code in terms of cvxpy's capabilities):

from cvxpy import *
import numpy as np

# Constants
x = np.array([-1, -2, -0.5, 1, 0, 3])
k = 4

# Vars
q = Variable()
t = Variable(x.shape[0])

# Constraints
constraints = [x[0] <= t[0] + q,
               x[1] <= t[1] + q,
               x[2] <= t[2] + q,
               x[3] <= t[3] + q,
               x[4] <= t[4] + q,
               x[5] <= t[5] + q,
               t >= 0.0]

objective = Minimize(sum(t) + k * q)

problem = Problem(objective, constraints)
problem.solve(verbose=True)
print(problem.value)

Output:

3.4999999942200737 (IPM-based solver; therefore only approximating optimum)

Projected Gradient-descent

It might come to mind to use some Gradient-descent based approach. In this case one needs to chose which kind of gradient-calculation to use:

  • gradient-calculation by hand
  • automatic differentiation
  • numerical differentiation

I'm not going into much details (see next paragraph on the why), but my attempts using autograd failed due to missing gradient-support for multi-dim sort (which i used in the function-def).

But: Projected Gradient-descent works well (and much much faster than all those convex-optimization solvers for large-scale problems), even with numerical-differentiation (which calls the function a lot!). I tested some non-monotonic PGD method and it worked well.

There are efficient linear-time projections onto the probability-simplex which one would use (Duchi et al, "Efficient projections onto the l1-Ball for Learning in High Dimension)! Remember: it's a constrained optimization problem and pure GD won't work (without reformulation).

Code is omitted.

Just solve it? Be clever! Use scalar-minimization!

You just got 2 variables x and y, both in [0,1] with the constraint x + y == 1.

Those two vars are collapsing into one and therefore we can use 1d-optimization / scalar-optimization which is much simpler to do!

Instead of optimizing x and y, we just optimize x and y=1.0 -x is given implicitly!

Of course one would use a bounded-method!

Code:

import numpy as np
from scipy.optimize import minimize_scalar
from time import perf_counter
np.random.seed(1)

# Constants
# ---------
k = 2
N = 500
A = np.random.random(size=(N,N))
A_zeros = np.random.choice([0, 1], size=(N, N))  # more interesting data
A *= A_zeros                                     # """
B = np.random.random(size=(N,N))                 # """
B_zeros = np.random.choice([0, 1], size=(N, N))  # """
B *= B_zeros                                     # """

# Function
# --------
#   remark: partition / incomplete sort might be better for large-scale
def func(x):
    return np.sum(np.sort(x*A + (1.-x)*B, axis=1)[:, -k:])  # y=1.0-x !!!

start = perf_counter()
res = minimize_scalar(func, method='bounded', bounds=(0., 1.))
end = perf_counter()
print(res)
print('used: ', end-start)

Output:

     fun: 930.68327527871179
 message: 'Solution found.'
    nfev: 20
  status: 0
 success: True
       x: 0.50511029941339591
   used:  0.20979814249560497

Keep in mind, that everything presented uses the fact that it's a convex-problem where every local-minimum is a global-minimum, which was proven by cvxpy!