Split panel regression get one unique result in R

157 Views Asked by At

I am working with a big data base but for sake of illustration I am using the Grunfeld data

My objective is to divide in chunks my data so my model can run, otherwise I run out of memory (90,000 gb needed). I am using splm for my data but since it works with plm I am using the latter for the example. Once I manage to run every chunk I will want to come with a general result.

What I have so far is this:

data("Grunfeld", package="plm")
Grunfeld <- pdata.frame(Grunfeld, index = c("firm","year"))
s1<-split(Grunfeld, sample(rep(1:4)))
fm <- value ~ capital
fix <- lapply(1:length(s), function(x) plm(fm, data=s1[[x]],model = "within"))

Now I have a list of coefficients and residuals fix

Is there a way I can create a function so my result emulate the solution of the complete data base instead of 4 chunks?

i.e.

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-1299.602   -88.290   -10.197    84.142  1324.118 

Coefficients:
        Estimate Std. Error t-value  Pr(>|t|)    
capital 0.551055   0.098634  5.5869 7.971e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    23078000
Residual Sum of Squares: 19807000
R-Squared:      0.14174
Adj. R-Squared: 0.09633
F-statistic: 31.213 on 1 and 189 DF, p-value: 7.9714e-08
1

There are 1 best solutions below

5
On

Consider splitting based on nrow of data frame. Below splits data into four chunks depending on data frame size.

num <- ceiling(nrow(Grunfeld) / 4)
chunks <- ceiling(1:nrow(Grunfeld) / num)
fm <- value ~ capital

df_list <- split(Grunfeld, chunks)
fix <- lapply(df_list, function(df) plm(fm, data=df, model = "within"))

Alternative to split + lapply is by:

num <- ceiling(nrow(Grunfeld) / 4)
chunks <- ceiling(1:nrow(Grunfeld) / num)
fm <- value ~ capital

fix <- by(Grunfeld, chunks, function(df) plm(fm, data=df, model = "within"))