I have three questions:
What is the problem of spatial-autocorrelation in linear regression?
Why can it be resolved by mixed linear models?
Is the model implementation below correct? Or do I need to add a correlation structure?
I have the following spatial-temproal dataset and I want to find out which of my estimator variables (bio 8,9,16,17) are significant in explaining production:
> head(RegrInput)
DistrID Year Production bio8 bio9 bio16 bio17
1 1 1982 1433.833 0.17695395 0.00240241 -18.73348 24.933607
2 1 1983 1151.877 -0.06570671 -0.56608232 19.08482 12.682994
3 1 1984 1317.626 0.48731404 0.64346423 -113.17526 -33.892477
...
There are 65 districts (DistrID) and 34 years. The problem is that the estimators are spatially auto-correlated. So, they will be similar in neighboring districts. I heard that in such case, the p values of a linear model
lm(Production ~ bio8 + bio9...)
cannot be trusted. I heard that a mixed linear model like
lme(Production ~ bio8 + bio9...,random=~1|DistrID)
can resolve this problem.
I understand that spatial auto-correlation violates linear independence of observations. I also see that such observations will not produce normally distributed residuals. So, two main assumption of linear regression are violated. But maybe somebody can give some basic illustrative reference or explain in simple terms why this is a problem and how it would be resolved by a mixed model here.
That would be very nice. Thanks a lot!
Felix