R crashes when running very large dredge in MuMIn

638 Views Asked by At

I've been running a dredge using the package MuMIn of a very large global model as a stepping stone with the end goal of getting the importance values of each variable. The dredge did run successfully in 2 days, but I later needed to add 2 variables and after their addition it has not been running to completion. It ran for 7 days the first time before R crashed (I'm assuming) because when I checked on it next, R had shut down and when I started RStudio again there were no error messages and it was a fresh window. I started the code again and this time it ran for 8 days before doing the same thing.

I'll explain my global model and subsets so it may be clear why the code takes so long to run: It is a gam with 19 variables and 4 interaction terms (all numeric with the exception of "year" which is categorical). There are certain variables that cannot appear together in a model because one is an alternate version of the other so I have subset these. Things are further complicated with the interaction terms because when the interaction term is in a model, the smoothed version of a term cannot be a main effect since the unsmoothed version of the term is automatically included, nor can the unsmoothed version of the term appear as a main effect when the interaction term is not in the model since there may already be the smoothed version of the term. Either option would result in double representation of one term in a model and so I have subsetted those options out as well. Here is the code showing the global model and code for the dredge:

library(MuMIn)
options(na.action = "na.fail")

real.model2 <- gam(resqpa ~ factor(year) + s(elev) + factor(year)*elev + 
s(bf1) + s(bf2) + s(bf3) + s(open) + s(water) + s(bfbs1) + s(bfbs2) + 
s(bfbs3) + s(bs2) + s(bs3) + s(mix) + s(cs1) + s(cs2) + factor(year)*bfbs2 + 
factor(year)*bfbs3 + factor(year)*bs2 + factor(year)*bs3 + s(allbf) + s(ct1) 
+ s(ct2) + s(allcs), family=binomial(link="cloglog"))

fits6 <- dredge(real.model2, subset =  (!("s(allbf)" & "s(bf1)")) 
            & (!("s(allbf)" & "s(bf2)")) 
            & (!("s(allbf)" & "s(bf3)")) 
            & (!("s(ct1)" & "s(bf1)")) 
            & (!("s(ct1)" & "s(bfbs1)")) 
            & (!("s(ct2)" & "s(bf2)")) 
            & (!("s(ct2)" & "s(bfbs2)"))
            & (!("s(ct2)" & "s(bs2)"))
            & (!("s(allcs)" & "s(cs1)"))
            & (!("s(allcs)" & "s(cs2)"))
            & (!(`s(elev)` & "elev:factor(year)")) & (!elev  | `elev:factor(year)`) 
            & (!(`s(bfbs2)` & "bfbs2:factor(year)")) & (!bfbs2  | `bfbs2:factor(year)`) 
            & (!(`s(bfbs3)` & "bfbs3:factor(year)")) & (!bfbs3  | `bfbs3:factor(year)`) 
            & (!(`s(bs2)` & "bs2:factor(year)")) & (!bs2  | `bs2:factor(year)`) 
            & (!(`s(bs3)` & "bs3:factor(year)")) & (!bs3  | `bs3:factor(year)`))

After R crashed the second time I ran a diagnostic; however, the time stamp for the error did not match the time when R crashed so I don't know if this is informative or not.

24 Sep 2018 03:06:26 [rdesktop] ERROR system error 231 (All pipe instances 
are busy); OCCURRED AT: virtual void 
rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest()

Is this a problem with needing more computational power, or is there a more concise/more correct way of writing this code that will allow it to finish? Thanks in advance for any help with this!

1

There are 1 best solutions below

0
On

It is difficult to tell what caused the error if you run R through RStudio. Try running your R script in plain R, in a console rather than RGui (or via R CMD batch), so your output is preserved after the crash. Also set dredge argument trace to TRUE, so you can see which model is causing the problem.

Another way is to create a list of all model calls via dredge(..., evaluate = FALSE), over which you can loop afterwards, eval-uating the calls. You can store all these model objects in a list, but this will create an extremely huge object, so might be better to save them on disk each time instead of keeping in memory, and/or create partial model selection tables that can be merged with rbind later on. This approach has the advantage that you can restart your loop past the point it crashed previously (if you save all the necessary data on disk).