I have a dataset with 1 column for dependent variable and 9 for independent variables. I have to fit logit models in R taking all combinations of the independent variables.
I have created formulae for the same to be used in "glm" function. However, every time I call "glm" function, it loads the data (which is same every time as only the formula changes in each iteration).
Is there a way to avoid this so as to speed up my computation? Can I use a vector of formulae in "glm" function and load data only once?
Code:
tempCoeffV <- lapply(formuleVector, function(s) { coef(glm(s,data=myData,family=binomial, y=FALSE, model=FALSE))})
formuleVector is a vector of strings like:
myData[,1]~myData[,2]+myData[,3]+myData[,5]
myData[,1]~myData[,2]+myData[,6]
myData is data.frame
In each lapply statement, myData remains the same. It is a data.frame with around 1,00,000 records. formuleVector is a vector with 511 different formulas. Is there a way to speed up this computation?
Great, you don't have factors; othersie I have to call
model.matrix
then play with$assign
field, rather than simply usingdata.matrix
.This is how you get your 511 candidates, right?
Now instead of the number of combinations, we need a combination index, easy to get from
combn
. The rest of the story is to write a loop nest and loop through all combinations.glm.fit
is used, as you only care coefficients.glm.fit
is much more costly than yourfor
loop. For readability, don't recode them aslapply
for example.In the end,
lst
is a nested list. Usestr(lst)
to understand it.