I want to solve the following optimization problem:
min_{x,y} f(xA+yB)
s.t: {x,y} >= 0 , (x+y)=1
in which, {A,B} are square matrices and {x,y} are scalars.
The function f(A)=sum(s_i), and s_i means sum of biggest k entries in the ith row of A.
For example if
A= [1 2 3
5 1 7
3 4 2]
then f(A) for k=2 would be
(2+3)+(5+7)+(4+3)=24
If i could calculate the derivatives of f respect to (x,y) then the problem would be solved easily.
Also i know that one way to deal with "max(a)" in an objective is to add a stack variable t to the objective and adding a<=t to the constraints. But i couldn't find the above problem that straightforward.
In order to compare the optimality of the solution, I solve the above problem using the Matlab general solver, but i need to know how can i optimize it myself.
(The question is somewhat unclear. I don't know if you want to learn how to formulize this as convex-optimization problem (homework?) or just solve it somehow. You did not provide your working approach and much more is missing. Please keep that in mind for your next question)
Convex-optimization with powerful modelling-languages
cvxpy (python) (and probably cvxopt (matlab) and convex.jl (julia)) support this kind of constraint. So the whole problem would look like:
Code:
These tools also proov convexity by construction, which is very important for other approaches we might use later (e.g. Projected Gradient-descent). This is a convex optimization problem!
Convex-optimization by hand
Let's focus on the formulation of
sum_largest(x, k)
,x
being a vector of variables,k
a positive constant.This can be formulated introducing some aux-vars. Let's fix
x
and re-formulate:sum_largest([1, 0, 3], 2):
This reformulation was taken from cvxpy's sources!
Test (not the nicest code in terms of cvxpy's capabilities):
Output:
Projected Gradient-descent
It might come to mind to use some Gradient-descent based approach. In this case one needs to chose which kind of gradient-calculation to use:
I'm not going into much details (see next paragraph on the why), but my attempts using autograd failed due to missing gradient-support for multi-dim sort (which i used in the function-def).
But: Projected Gradient-descent works well (and much much faster than all those convex-optimization solvers for large-scale problems), even with numerical-differentiation (which calls the function a lot!). I tested some non-monotonic PGD method and it worked well.
There are efficient linear-time projections onto the probability-simplex which one would use (Duchi et al, "Efficient projections onto the l1-Ball for Learning in High Dimension)! Remember: it's a constrained optimization problem and pure GD won't work (without reformulation).
Code is omitted.
Just solve it? Be clever! Use scalar-minimization!
You just got
2 variables x and y
, both in[0,1]
with the constraintx + y == 1
.Those two vars are collapsing into one and therefore we can use 1d-optimization / scalar-optimization which is much simpler to do!
Instead of optimizing
x
andy
, we just optimizex
andy=1.0 -x
is given implicitly!Of course one would use a bounded-method!
Code:
Output:
Keep in mind, that everything presented uses the fact that it's a convex-problem where every local-minimum is a global-minimum, which was proven by cvxpy!