I am constructing bootstrap function that will estimate logistic regression on each bootstrapped data set and do some additional calculations. For small samples sometimes it may produce warnings such as
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
I would like to indicate such warnings for each bootstrap iteration. Here is my code:
library(parallel)
n=length(y);
dataN=as.matrix((cbind(y,data)));
cl <- makeCluster(detectCores())
clusterExport(cl,c("dataN"), envir=environment())
clusterExport(cl,c("n"), envir=environment())
clusterEvalQ(cl,library(glmnet))
q=2000;
repl1=parLapply(cl=cl,1:q,function(i,dataA=dataN,smpl=n,...){
dataB <- dataA[sample(nrow(dataA),size=smpl,replace=TRUE),]
p=dim(dataB)[2];p
forname=c(colnames(dataB)[2:p]);
formulaZS=as.formula(paste("y~",paste(forname,collapse="+"),sep=""));
assign("last.warning", NULL, envir = baseenv())
m=glm(formulaZS,family=binomial,data=data.frame(dataB))
coefs=summary(m)$coef[,1];coefs
a=length(attr(warnings(),"names"))
indicator=ifelse(a==0,0,1);
return(list(coefs=coefs,indicator=indicator))
})
stopCluster(cl)
beta=t(matrix(unlist(lapply(repl1, "[[", "coefs")),nrow=p+1));
beta.est=apply(beta,2,mean);beta.est
indicator=t(matrix(unlist(lapply(repl1, "[[", "indicator")),nrow=1));
sum(indicator)
Before each regression estimation I'm resetting warnings by
assign("last.warning", NULL, envir = baseenv())
and then checking if there was warning by calculating the length of warnings()
names. This works if I'm doing it manually, without application of parLapply
. However, if I'm running this function it produces the vector indicator
with only zeroes even if there were warnings.
Data generating code:
fundata=function(n,p,corr){
R = matrix(rep(corr,p*p),nrow=p)+(1-corr)*diag(p);
R <- round(((R * lower.tri(R)) + t(R * lower.tri(R))),2)
diag(R) <- 1
U = t(chol(R))
nvars = dim(U)[1]
random.normal = matrix(rnorm(nvars*n,mean=0,sd=1), nrow=nvars, ncol=n);
X = U %*% random.normal
newX = t(X)
data = as.data.frame(newX)
names(data)<-sprintf("V%d",1:p)
return(data=data)
}
n=50;
p=5;
corr=0.5;
seed=123;
#Generate independent variables
data=fundata(n,p,corr);
#Generate dependent variable
p=dim(data)[2];p
true=c(-1,0,0,0.2,0.675,-1.5)
z=as.matrix(cbind(rep(1,n),data[,1:p]))%*%true;
pr = 1/(1+exp(-z));
y = rbinom(n=n, size=1, prob=t(pr))
dataN=as.matrix((cbind(y,data)))