I want to run a constrained linear regression such that the coefficients are nonnegative and sum to one. Usually this can be done with quadratic programming, However, I have more parameters (constraints) than observations (p > n). How can I do this?
This quadprog-solution only works for problems with n > p:
library(quadprog);
N <- 20
P <- 40
X <- matrix(runif(N*P), ncol=P)
btrue <- c(1,1,rep(0,(P-2)))
Y <- X %*% btrue + rnorm(N, sd=0.2)
C <- cbind(rep(1,P), diag(P))
b <- c(1,rep(0,P))
solve.QP(Dmat = t(X)%*%X, dvec = t(Y)%*%X , Amat = C, bvec = b, meq = 1)
You can do this using the following model:
where b are the parameters to estimate and r are the residuals. Both b and r are decision variables. This problem is always convex (i.e. Q=I is always positive definite) even if n < p.
However, Quadprog may still not like it (I think it wants a strictly pos def D matrix, unlike most QP solvers). Fix this by changing D to:
The R code can look like: