quadratic programming in R with more parameters (constraints) than observations

243 Views Asked by At

I want to run a constrained linear regression such that the coefficients are nonnegative and sum to one. Usually this can be done with quadratic programming, However, I have more parameters (constraints) than observations (p > n). How can I do this?

This quadprog-solution only works for problems with n > p:

library(quadprog);
N <- 20
P <- 40
X <- matrix(runif(N*P), ncol=P)
btrue <- c(1,1,rep(0,(P-2)))
Y <- X %*% btrue + rnorm(N, sd=0.2)
C <- cbind(rep(1,P), diag(P))
b <- c(1,rep(0,P))
solve.QP(Dmat = t(X)%*%X, dvec = t(Y)%*%X , Amat = C, bvec = b, meq = 1)
1

There are 1 best solutions below

0
On

You can do this using the following model:

min r'r
r = X'b - y
b >= 0
sum(b) = 1
r free

where b are the parameters to estimate and r are the residuals. Both b and r are decision variables. This problem is always convex (i.e. Q=I is always positive definite) even if n < p.

However, Quadprog may still not like it (I think it wants a strictly pos def D matrix, unlike most QP solvers). Fix this by changing D to:

D = [ 0.0001*I  0  ]
    [ 0         I  ]

The R code can look like:

#
#
#  b = [b]  (P)
#      [r]  (N)
#
#  D =  [ 0 0 ] 
#       [ 0 I ]
#
#  d = [0]
#      [0]
#
#  A' =  [ 1' 0' ]
#        [ I  0  ]
#
# b0 = [1]
#      [0]
#


# fudge left upper sub matrix
D  = rbind( cbind( 0.0001*diag(P), matrix(rep(0,P*N),nrow=P,ncol=N)),
            cbind( matrix(rep(0,N*P),nrow=N,ncol=P), diag(N) ) 
           )
d = rep(0, P+N)

A = rbind( cbind( matrix(rep(1,P),nrow=1), matrix(rep(0,N),nrow=1)),
           cbind(diag(P),matrix(rep(0,P*N),nrow=P,ncol=N)))

b0 = rbind( matrix(c(1),nrow=1,ncol=1), matrix(rep(0,P),nrow=P,ncol=1))


solve.QP(Dmat = D, dvec = d , Amat = t(A), bvec = b0, meq = 1)