linear (log-log) model with 'lm': how to get prediction variance of sum of predicted values

429 Views Asked by At

I'm fitting a power model to a dataset by applying a simple linear model with the R function lm after log-log transformation, as in the example below (instead of fitting directly the power model, for example by applying the nls function). I could use the function predict.lm to apply the model on new data and calculate prediction intervals.

data(stackloss); dat <- stackloss[c(2, 4)]; colnames(dat) <- c("x","y")
dat.lm <- lm(log(y) ~ log(x), data = dat)

new <- data.frame(x = seq(0, 30, 1))
pred <- predict.lm(dat.lm, new, interval = "prediction", level = 0.95)
matplot(new$x, exp(pred), type = "l", col = 1, lty = c(1, 2, 2)); points(dat$x, dat$y)

Now, I need to sum n predicted values (which is straightforward, after applying the 'exp' function) and also to calculate the aggregated variance and prediction intervals. The latter has been described for a simple linear model in the following Q&A: linear model with `lm`: how to get prediction variance of sum of predicted values. In that interesting answer, the following functions lm_predict (that allows to compute complete variance-covariance matrix of predicted values) and agg_pred were introduced for the simple linear model.

lm_predict <- function (lmObject, newdata, diag = TRUE) {
  ## input checking
  if (!inherits(lmObject, "lm")) stop("'lmObject' is not a valid 'lm' object!")
  ## extract "terms" object from the fitted model, but delete response variable
  tm <- delete.response(terms(lmObject))      
  ## linear predictor matrix
  Xp <- model.matrix(tm, newdata)
  ## predicted values by direct matrix-vector multiplication
  pred <- c(Xp %*% coef(lmObject))
  ## efficiently form the complete variance-covariance matrix
  QR <- lmObject$qr   ## qr object of fitted model
  piv <- QR$pivot     ## pivoting index
  r <- QR$rank        ## model rank / numeric rank
  if (is.unsorted(piv)) {
    ## pivoting has been done
    B <- forwardsolve(t(QR$qr), t(Xp[, piv]), r)
    } else {
    ## no pivoting is done
    B <- forwardsolve(t(QR$qr), t(Xp), r)
    }
  ## residual variance
  sig2 <- c(crossprod(residuals(lmObject))) / df.residual(lmObject)
  if (diag) {
    ## return point-wise prediction variance
    VCOV <- colSums(B ^ 2) * sig2
    } else {
    ## return full variance-covariance matrix of predicted values
    VCOV <- crossprod(B) * sig2
    }
  list(fit = pred, var.fit = VCOV, df = lmObject$df.residual, residual.var = sig2)
  }

agg_pred <- function (w, predObject, alpha = 0.95) {
  ## input checing
  if (length(w) != length(predObject$fit)) stop("'w' has wrong length!")
  if (!is.matrix(predObject$var.fit)) stop("'predObject' has no variance-covariance matrix!")
  ## mean of the aggregation
  agg_mean <- c(crossprod(predObject$fit, w))
  ## variance of the aggregation
  agg_variance <- c(crossprod(w, predObject$var.fit %*% w))
  ## adjusted variance-covariance matrix
  VCOV_adj <- with(predObject, var.fit + diag(residual.var, nrow(var.fit)))
  ## adjusted variance of the aggregation
  agg_variance_adj <- c(crossprod(w, VCOV_adj %*% w))
  ## t-distribution quantiles
  Qt <- c(-1, 1) * qt((1 - alpha) / 2, predObject$df, lower.tail = FALSE)
  ## names of CI and PI
  NAME <- c("lower", "upper")
  ## CI
  CI <- setNames(agg_mean + Qt * sqrt(agg_variance), NAME)
  ## PI
  PI <- setNames(agg_mean + Qt * sqrt(agg_variance_adj), NAME)
  ## return
  list(mean = agg_mean, var = agg_variance, CI = CI, PI = PI)
  }

However, these cannot be applied directly to properly aggregate variance in the case of log-log regression. Maybe I should transform the variance in the output of lm_predict, but I couldn't figure how to proceed. Thank'you in advance for any help.

0

There are 0 best solutions below