In R, I am trying to optimize the following : choose rows which maximize the number of columns whose sum exceeds a certain value which varies by column + some other basic constraints on the row selections.
Is there anything out there in R which allows you to incorporate logic into an objective function? ie maximize countif ( sum(value column) > target value for column ) over ~10k columns choosing 5 rows with ~ 500 row choices.
Simple example: Grab the combo of 4 rows below whose col sum exceeds the target more frequently than any other combo of 4 rows.
+--------+------+------+------+------+------+------+------+------+------+-------+
| x | col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | col10 |
+--------+------+------+------+------+------+------+------+------+------+-------+
| row1 | 82 | 73 | 50 | 11 | 76 | 12 | 46 | 64 | 5 | 44 |
| row2 | 2 | 33 | 35 | 55 | 52 | 18 | 13 | 86 | 72 | 39 |
| row3 | 94 | 5 | 10 | 21 | 90 | 62 | 54 | 54 | 7 | 17 |
| row4 | 27 | 10 | 28 | 87 | 27 | 83 | 62 | 56 | 54 | 86 |
| row5 | 17 | 50 | 34 | 30 | 80 | 7 | 96 | 91 | 32 | 21 |
| row6 | 73 | 75 | 32 | 71 | 37 | 1 | 13 | 76 | 10 | 34 |
| row7 | 98 | 13 | 87 | 49 | 27 | 90 | 28 | 75 | 55 | 21 |
| row8 | 45 | 54 | 25 | 1 | 3 | 75 | 84 | 76 | 9 | 87 |
| row9 | 40 | 87 | 44 | 20 | 97 | 28 | 88 | 14 | 66 | 77 |
| row10 | 18 | 28 | 21 | 35 | 22 | 9 | 37 | 58 | 82 | 97 |
| target | 200 | 100 | 125 | 135| 250 | 89 | 109 | 210| 184 | 178 |
+--------+------+------+------+------+------+------+------+------+------+-------+
EDIT + Update: I implemented the following using ompr, ROI, and some Big M logic.
nr <- 10 # number of rows
nt <- 15 # number of target columns
vals <- matrix(sample.int(nr*nt, nr*nt), nrow=nr, ncol=nt)
targets <- vector(length=nt)
targets[1:nt] <- 4*mean(vals)
model <- MIPModel() %>%
add_variable(x[i], i = 1:nr, type = "binary") %>%
add_constraint(sum_expr(x[i], i = 1:nr)==4)%>%
add_variable(A[j], j = 1:nt, type = "binary") %>%
add_variable(s[j], j = 1:nt, type = "continuous",lb=0) %>%
add_constraint(s[j] <= 9999999*A[j], j =1:nt)%>%
add_constraint(s[j] >= A[j], j =1:nt)%>%
add_constraint(sum_expr(vals[i,j]*x[i], i = 1:nr) + A[j] + s[j] >= targets[j], j=1:nt) %>%
set_objective(sum_expr(-9999999*A[j], i = 1:nr, j = 1:nt), "max")
model <- solve_model(model,with_ROI(solver = "glpk"))
The model works great for small problems, including those where no solution exists which exceeds the target of every column.
However, the above returns Infeasible when I change the number of columns to even just 150. Given that I tested various scenarios on my smaller example, my hunch is that my model definition is OK...
Any suggestions as to why this is infeasible? Or maybe a more optimal way to define my model?
What you want to solve here is called a "mixed integer program", and there's lots of (mostly commercial) software designed around it.
Your typical R functions such as
optimare hardly any good for it due to the kind of constraints, but you can use specialized software (such as CBC) as long as you are able to frame the problem in a standard MIP structure (in this case the variables to optimize are binary variables for each row in your data).As an alternative, you could also look at the package
nloptrwith its global derivate-free black-box optimizers, in which you can enter a function like this (setting bounds on the variables) and let it optimize it with some general-purpose heuristics.