I am working with a big data base but for sake of illustration I am using the Grunfeld
data
My objective is to divide in chunks my data so my model can run, otherwise I run out of memory (90,000 gb needed). I am using splm
for my data but since it works with plm
I am using the latter for the example. Once I manage to run every chunk I will want to come with a general result.
What I have so far is this:
data("Grunfeld", package="plm")
Grunfeld <- pdata.frame(Grunfeld, index = c("firm","year"))
s1<-split(Grunfeld, sample(rep(1:4)))
fm <- value ~ capital
fix <- lapply(1:length(s), function(x) plm(fm, data=s1[[x]],model = "within"))
Now I have a list of coefficients and residuals fix
Is there a way I can create a function so my result emulate the solution of the complete data base instead of 4 chunks?
i.e.
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-1299.602 -88.290 -10.197 84.142 1324.118
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
capital 0.551055 0.098634 5.5869 7.971e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 23078000
Residual Sum of Squares: 19807000
R-Squared: 0.14174
Adj. R-Squared: 0.09633
F-statistic: 31.213 on 1 and 189 DF, p-value: 7.9714e-08
Consider splitting based on
nrow
of data frame. Below splits data into four chunks depending on data frame size.Alternative to
split
+lapply
isby
: