I have been trying to manually reproduce the result given by the Sargan test in R, sadly to no avail.
When I run ivreg()
and then output the Sargan test statistic:
eitc <- read.dta13('education_earnings_v2.dta')
eitc$ln.wage <- log(eitc$wage)
TSLS <- ivreg(data = eitc, ln.wage ~ educ + exper + south + nonwhite
| nearc4 + nearc2 + exper + south + nonwhite)
summary(TSLS, diagnostics=TRUE)
I get a Sargan statistic of 1.63. However, when I try to perform the test manually:
surp_IV1 <- lm(educ ~ nearc2 + nearc4 + exper + south + nonwhite, data=eitc)
surp_IV_fit <- surp_IV1$fitted.values
surp_IV2 <- lm(ln.wage ~ surp_IV_fit + exper + south + nonwhite, data=eitc)
surp_resid <- resid(surp_IV2)
test_surplus <- lm(surp_resid ~ nearc2 + nearc4 + exper + south + nonwhite,
data = eitc)
summary(test_surplus)
With R-Squared = 0.0008032 on 3,010 observations, I get a test statistic of 2.42.
What is the reason for the difference?
I guess some of the steps are not necessary.
The procedure is taken from 15-5b Testing Overidentification Restrictions in (Wooldridge, 2019, 7th Edition).
Created on 2023-12-15 with reprex v2.0.2