How to find the minimum floating-point value accepted by betareg package?

178 Views Asked by At

I'm doing a beta regression in R, which requires values between 0 and 1, endpoints excluded, i.e. (0,1) instead of [0,1].

I have some 0 and 1 values in my dataset, so I'd like to convert them to the smallest possible neighbor, such as 0.0000...0001 and 0.9999...9999. I've used .Machine$double.xmin (which gives me 2.225074e-308), but betareg() still gives an error:

invalid dependent variable, all observations must be in (0, 1)

If I use 0.000001 and 0.999999, I got a different set of errors:

1: In betareg.fit(X, Y, Z, weights, offset, link, link.phi, type, control) : failed to invert the information matrix: iteration stopped prematurely
2: In sqrt(wpp) : Error in chol.default(K) : the leading minor of order 4 is not positive definite

Only if I use 0.0001 and 0.9999 I can run without errors. Is there any way I can improve this minimum values with betareg? Or should I just be happy with that?

1

There are 1 best solutions below

0
On

Try it with eps (displacement from 0 and 1) first equal to 1e-4 (as you have here) and then with 1e-3. If the results of the models don't differ in any way you care about, that's great. If they are, you need to be very careful, because it suggests your answers will be very sensitive to assumptions.

In the example below the dispersion parameter phi changes a lot, but the intercept and slope parameter don't change very much.

If you do find that the parameters change by a worrying amount for your particular data, then you need to think harder about the process by which zeros and ones arise, and model that process appropriately, e.g.

  • a censored-data model: zero/one arise through a minimum/maximum detection threshold, models the zero/one values as actually being somewhere in the tails or
  • a hurdle/zero-one inflation model: zeros and ones arise through a separate process from the rest of the data, use a binomial or multinomial model to characterize zero vs. (0,1) vs. one, then use a Beta regression on the (0,1) component)

Questions about these steps are probably more appropriate for CrossValidated than for SO.

sample data

set.seed(101)
library(betareg)
dd <- data.frame(x=rnorm(500))
rbeta2 <- function(n, prob=0.5, d=1) {
    rbeta(n, shape1=prob*d, shape2=(1-prob)*d)
}
dd$y <- rbeta2(500,plogis(1+5*dd$x),d=1)
dd$y[dd$y<1e-8] <- 0

trial fitting function

ss <- function(eps) {
    dd <- transform(dd,
                    y=pmin(1-eps,pmax(eps,y)))
    m <- try(betareg(y~x,data=dd))
    if (inherits(m,"try-error")) return(rep(NA,3))
    return(coef(m))
}
ss(0)  ## fails
ss(1e-8) ## fails
ss(1e-4)
## (Intercept)           x       (phi) 
##   0.3140810   1.5724049   0.7604656
ss(1e-3)  ## also fails
ss(1e-2)
## (Intercept)           x       (phi) 
##   0.2847142   1.4383922   1.3970437
ss(5e-3)
## (Intercept)           x       (phi) 
##   0.2870852   1.4546247   1.2029984

try it for a range of values

evec <- seq(-4,-1,length=51)
res <- t(sapply(evec, function(e) ss(10^e)) )
library(ggplot2)
ggplot(data.frame(e=10^evec,reshape2::melt(res)),
       aes(e,value,colour=Var2))+
    geom_line()+scale_x_log10()

enter image description here