I have been using glm() in R to compute confidence intervals for the logit probability parameter governing a single binomial draw.
P <- 20 # Number of successes
D <- 1 # Number of failures
model1 <- glm(matrix(c(P,D), nrow=1) ~ 1, family="binomial") # Successes modeled as binomial draw from successes+failures
summary(model1)
confint(model1) # Confidence interval on binomial parameter p
I notice that if either the number of successes or failures is zero (P=0
or D=0
), the confidence intervals returned are meaningless.
I then computed my own confidence intervals by numerically integrating the normalized binomial likelihood:
binomial_fun <- function(p) {choose(N, K)*(p^K)*(1-p)^(N-K)} # A binomial function
logit_fun <- function(p) {log(p/(1-p))} # A logit function
f_upper <- function(a) {integrate(binomial_fun, 0, a )$value/integrate(binomial_fun, 0, 1 )$value - .975}
f_lower <- function(a) {integrate(binomial_fun, a, 1 )$value/integrate(binomial_fun, 0, 1 )$value - .975}
# These functions will take value zero when a corresponds to the 97.5% and 2.5% points
# of the normalized binomial likelihood given N and K.
N <- P+D # N is the number of trials
K <- P # K is the number of successes
uci2 <- logit_fun(uniroot( f_upper, c(0, 1) )$root) # Look for a solution in [0,1]
lci2 <- logit_fun(uniroot( f_lower, c(0, 1) )$root) # Look for a solution in [0,1]
This gave meaningful confidence intervals where P=0
or D=0
.
However, it gives different confidence intervals from glm()
+ confint()
even when neither P
nor D
is zero.
confint(model1)
c(lci2, uci2)
Compared to glm()
+ confint()
, both the lci and uci that I compute tend to be closer to zero.
How is confint()
computing the interval? I'm sure there is a good reason for doing it this way in terms of performance on more complicated glms, but it seems like a weird result in this simple case.