Maximizing variance of bounded variables

528 Views Asked by At

Let x_1, ..., x_i,..., x_p be p real numbers such that 0 <= x_i <= b for all i. That is, each x_i can take any value between 0 and b. I'd like to find the values of {x_i}'s that maximize the variance among them.

Do you have any hints? I'd like to use this result for my example code. Or, isn't this question well-defined?

At first I thought of something like x_1=0, x_2=...=x_p=b, but then I found this does not maximize the variance when p is a little bit large.

Thanks

1

There are 1 best solutions below

3
On BEST ANSWER

After comments, I did some trials on a numerical prove for your problem. There's still some work to do, but I hope it puts you in the right track. Besides, I've used python, and I have no idea if this is ok for you or not. You can surely find equivalent ways to do it in matlab and R.

I use the well-known property of the variance = E[X^2] - E[X]^2, to make the derivatives easier. (If you have doubts, check wiki).

The python package scipy.optimize has a method minimize to minimize numerically a function. You can select an algorithm for solving the problem; I'm not so familiar with the possible algorithms and I was looking for the well-known plain gradient descent (well, at least I hope you know it), and I think a closed one could be SLSQP, but honestly I'm not 100% sure on the details.

And finally, I didn't make sure that the function you're minimizing is convex, or figured out whether it has local minima, but the results look fine.

I give you the code in python below, in case it is useful, but the bottomline is that I'd suggest you:

  • Choose a language/package you are familiar with
  • Choose an algorithm for the optimization
  • It would be nice to prove that the function is convex (so that the solution converges)
  • Set the parameters for which you want to make the prove

Code below. Hope it helps.

I'm not going to post the algebra for the derivatives, I hope you can make them yourself. And you must take into account that you are maximizing and not minimizing, so you have to multiply by -1, as explained I hope quite clearly here (look for "maximizing").

Setup,

In [1]:

from scipy.optimize import minimize
import numpy as np

The function you are maximizing, that is the variance (remember the the trick E[X^2] - E[X]^2, and the -1),

In [86]:

def func(x):
    return (-1) * (np.mean([xi**2 for xi in x]) - np.mean(x)**2)

The derivative of that function for each of the xi of the vector x, (I hope you can derivate and get to the same result),

In [87]:

def func_deriv(x):
    n = len(x)
    l = []
    for i in range(n):
        res = (2 * x[i] / n) - ((2/(n**2)) * (x[i] + sum([x[j] for j in range(n) if j != i])))
        l +=  [(-1) * res]
    return np.array(l)

Actually, I made quite a few mistakes when writing this function, both in the derivative and the python implementation. But there is a trick that helps a lot, which is to check the derivative in a numeric way, by adding and subtracting a small epsilon in every dimension and calculating the slope of the curve see wiki. This would the function that approximates the derivative,

In [72]:

def func_deriv_approx(x, epsilon=0.00001):
    l = []
    for i in range(len(x)):
        x_plus = [x[j]+((j == i)*epsilon) for j in range(len(x))]
        x_minus = [x[j]-((j == i)*epsilon) for j in range(len(x))]
        res = (-1) * (func(x_plus) - func(x_minus)) / (2*epsilon)
        l += [res]
    return l

And then I've checked func_deriv_approxversus func_deriv for a bunch of values.

And the minimizing itself. If I initialize the values to the solution we suspect is right, it works ok, it only iterates once and gives the expected results,

In [99]:

res = minimize(func, [0, 0, 10, 10], jac=func_deriv, bounds=[(0,10) for i in range(4)],
               method='SLSQP', options={'disp': True})
Optimization terminated successfully.    (Exit mode 0)
            Current function value: -25.0
            Iterations: 1
            Function evaluations: 1
            Gradient evaluations: 1

In [100]:

print(res.x)
[  0.   0.  10.  10.]

(Note that you could use the length you wanted, since func and func_deriv are written in a way that accept any length).

You could initialize randomly like this,

In [81]:

import random
xinit = [random.randint(0, 10) for i in range(4)]

In [82]:

xinit
Out[82]:
[1, 2, 8, 7]

And then the maximization is,

In [83]:

res = minimize(func, xinit, jac=func_deriv, bounds=[(0,10) for i in range(4)],
               method='SLSQP', options={'disp': True})
Optimization terminated successfully.    (Exit mode 0)
            Current function value: -25.0
            Iterations: 3
            Function evaluations: 3
            Gradient evaluations: 3
In [84]:

print(res.x)
[  1.27087156e-13   1.13797860e-13   1.00000000e+01   1.00000000e+01]

Or finally for length = 100,

In [85]:

import random
xinit = [random.randint(0, 10) for i in range(100)]

In [91]:

res = minimize(func, xinit, jac=func_deriv, bounds=[(0,10) for i in range(100)],
               method='SLSQP', options={'disp': True})
Optimization terminated successfully.    (Exit mode 0)
            Current function value: -24.91
            Iterations: 23
            Function evaluations: 22
            Gradient evaluations: 22
In [92]:

print(res.x)
[  2.49143492e-16   1.00000000e+01   1.00000000e+01  -2.22962789e-16
  -3.67692105e-17   1.00000000e+01  -8.83129256e-17   1.00000000e+01
   7.41356521e-17   3.45804774e-17  -8.88402036e-17   1.31576404e-16
   1.00000000e+01   1.00000000e+01   1.00000000e+01   1.00000000e+01
  -3.81854094e-17   1.00000000e+01   1.25586928e-16   1.09703896e-16
  -5.13701064e-17   9.47426071e-17   1.00000000e+01   1.00000000e+01
   2.06912944e-17   1.00000000e+01   1.00000000e+01   1.00000000e+01
  -5.95921560e-17   1.00000000e+01   1.94905365e-16   1.00000000e+01
  -1.17250430e-16   1.32482359e-16   4.42735651e-17   1.00000000e+01
  -2.07352528e-18   6.31602823e-17  -1.20809001e-17   1.00000000e+01
   8.82956806e-17   1.00000000e+01   1.00000000e+01   1.00000000e+01
   1.00000000e+01   1.00000000e+01   3.29717355e-16   1.00000000e+01
   1.00000000e+01   1.00000000e+01   1.00000000e+01   1.00000000e+01
   1.43180544e-16   1.00000000e+01   1.00000000e+01   1.00000000e+01
   1.00000000e+01   1.00000000e+01   2.31039883e-17   1.06524134e-16
   1.00000000e+01   1.00000000e+01   1.00000000e+01   1.00000000e+01
   1.77002357e-16   1.52683194e-16   7.31516095e-17   1.00000000e+01
   1.00000000e+01   3.07596508e-17   1.17683979e-16  -6.31665821e-17
   1.00000000e+01   2.04530928e-16   1.00276075e-16  -1.20572493e-17
  -3.84144993e-17   6.74420338e-17   1.00000000e+01   1.00000000e+01
  -9.66066818e-17   1.00000000e+01   7.47080743e-17   4.82924982e-17
   1.00000000e+01  -9.42773478e-17   1.00000000e+01   1.00000000e+01
   1.00000000e+01   1.00000000e+01   1.00000000e+01   5.01810185e-17
  -1.75162038e-17   1.00000000e+01   6.00111991e-17   1.00000000e+01
   1.00000000e+01   7.62548028e-17  -6.90706135e-17   1.00000000e+01]