How to calculate variance explained by a variable of interest, in a lm model with covariates?

142 Views Asked by At

I am working in R on linear regressions with covariates, looking like : lm(x ~ y + a + b + c)

With the summary() function, I can get the p value corresponding to each of the variables of the model. However, I only have the R^2 corresponding to the whole model, which isn't informative of the contribution of my variable of interest (y).

How do I get the R^2 corresponding to the part of the variance explained by y alone ?

I tried :

sapply(model,function(x) summary(x)$r.squared)

as advised here : Print R-squared for all of the models fit with lmList but it returns `

Error in summary(x)$r.squared : $ operator is invalid for atomic vectors"`

I was also advised to calculate the difference between the R^2 of my model and the R^2 of a linear model without my variable of interest. Is that a valid method ? Anyway I would still like to know if there is an easier way to do it, for example included in some package.

2

There are 2 best solutions below

0
AkselA On BEST ANSWER

It doesn't make much sense to talk about the R-squared of individual variables in a multivariate model. Delta R-squared, as Zephryl mentions, or partial R-squared are two measures that will do something like what you want.

Discussion on delta R-squared and partial R-squared: https://stats.stackexchange.com/questions/64010/importance-of-predictors-in-multiple-regression-partial-r2-vs-standardized

Partial R-squared can be calculate like this:

fm1 <- lm(rating ~ ., data=attitude)
# summary(fm1 <- lm(sr ~ ., data=LifeCycleSavings))
# summary(fm1 <- lm(Employed ~ ., data=longley))
# summary(fm1 <- lm(stack.loss ~ stack.x))

reduced <- lapply(seq_len(ncol(fm1$model)-1), function(x) update(fm1, terms(fm1)[-x]))

reduced.sse <- sapply(reduced, function(x) deviance(x))
fm1.sse <- deviance(fm1)
partial.r2 <- c(0, (reduced.sse - fm1.sse)/reduced.sse)
(fm1.coefs <- cbind(summary(fm1)$coefficients, partial.r2))
#                 Estimate  Std. Error     t value      Pr(>|t|)   partial.r2
# (Intercept) 10.787076386 11.58925724  0.93078238 0.36163372105 0.0000000000
# complaints   0.613187608  0.16098311  3.80901816 0.00090286788 0.3868076082
# privileges  -0.073050143  0.13572469 -0.53822295 0.59559392051 0.0124382943
# learning     0.320332116  0.16852032  1.90085160 0.06992534595 0.1357684079
# raises       0.081732134  0.22147768  0.36903102 0.71548008844 0.0058861866
# critical     0.038381447  0.14699544  0.26110638 0.79633426421 0.0029554369
# advance     -0.217056682  0.17820947 -1.21798623 0.23557704863 0.0605914609
6
jay.sf On

You were close with your sapply, just list a full and a restricted model and calculate difference, e.g.

> full <- lm(mpg ~ ., mtcars)
> rest <- lm(mpg ~ . - wt, mtcars)
> diff(sapply(list(full, rest), \(x) summary(x)$r.squared))
[1] -0.02399046

where weight wt appears to explain 2.4% of the variance in miles per gallon in the full model.