I would like to know why when I use stepGAICAll.A() function within a for-loop
gives different results compared with those obtained when I use it without a for-loop (of course, setting the same function arguments).
The data used for explaining the problem can be loaded after gamlss package is loaded:
library(gamlss)
data('dbhh')
df <- dbhh[1:100,] # this is a subset for simplifying calculations
dfm <- transform(df, lage=log(age), lht=log(ht))
head(dfm)
The new data frame dfm
is as follows:
head age ht lage lht
1 33.6 0.03 53.3 -3.506558 3.975936
2 33.6 0.04 50.8 -3.218876 3.927896
3 33.7 0.04 50.1 -3.218876 3.914021
4 35.0 0.04 53.5 -3.218876 3.979682
5 36.1 0.04 52.1 -3.218876 3.953165
6 36.6 0.05 50.5 -2.995732 3.921973
First case: using stepGAICAll.A() without a for-loop
If I specify one distribution (in this case “BCTo”) and apply stepGAICAll.A() function:
options(warn=1)
nC <- 4 # this is the number of CPU cores
m0_1 <- gamlss(head ~ 1, data=dfm, family=BCTo, n.cyc=800) #mu.start=30, sigma.start=1, nu.start=1, tau.start=1
model_1 <- stepGAICAll.A(m0_1,
scope=list(lower = ~ 1, upper = ~ lage + lht),
k=2, parallel='multicore', ncpus=nC)
the results (after executing summary(model_1)
) are as follows:
******************************************************************
Family: c("BCTo", "Box-Cox-t-orig.")
Call: gamlss(formula = head ~ lht + lage, sigma.formula = ~lht,
nu.formula = ~1, tau.formula = ~lage + lht, family = BCTo,
data = dfm, n.cyc = 800, trace = FALSE)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: log
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.84363 0.12370 14.90 < 2e-16 ***
lht 0.48800 0.03058 15.96 < 2e-16 ***
lage 0.06683 0.01208 5.53 3.04e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.208 11.098 2.181 0.0317 *
lht -7.007 2.797 -2.506 0.0140 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Nu link function: identity
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.857 4.086 -1.434 0.155
------------------------------------------------------------------
Tau link function: log
Tau Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 71.058 71.051 1.000 0.320
lage -7.393 6.707 -1.102 0.273
lht -21.912 16.311 -1.343 0.183
------------------------------------------------------------------
No. of observations in the fit: 100
Degrees of Freedom for the fit: 9
Residual Deg. of Freedom: 91
at cycle: 8
Global Deviance: 287.9476
AIC: 305.9476
SBC: 329.3941
******************************************************************
Second case: using stepGAICAll.A() within a for-loop
If I try the same for a group of distributions, including “BCTo”
pdf_2 <- c('BCTo', 'BCPEo')
m0_2 <- list()
model_2 <- list()
n_2 <- length(pdf_2)
for (i in 1:n_2) {
m0_2[[pdf_2[i]]] <- gamlss(head ~ 1, data=dfm, family=pdf_2[i], n.cyc=800) #mu.start=30, sigma.start=1, nu.start=1, tau.start=1
model_2[[pdf_2[i]]] <- stepGAICAll.A(m0_2[[pdf_2[i]]],
scope = list(lower = ~ 1,
upper = ~ lage + lht),
k=2, parallel='multicore', ncpus=nC)
}
then results (after executing summary(model_2[['BCTo']])
) differ regarding the previously calculated using only “BCTo”:
******************************************************************
Family: c("BCTo", "Box-Cox-t-orig.")
Call: gamlss(formula = head ~ lht + lage, sigma.formula = ~lht,
nu.formula = ~1, tau.formula = ~lage, family = pdf_2[i],
data = dfm, n.cyc = 800, trace = FALSE)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: log
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.82189 0.66376 2.745 0.007281 **
lht 0.49403 0.15754 3.136 0.002301 **
lage 0.06760 0.01841 3.671 0.000405 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.743 11.644 1.438 0.1539
lht -5.129 2.930 -1.750 0.0834 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------
Nu link function: identity
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.498 10.255 -0.634 0.528
------------------------------------------------------------------
Tau link function: log
Tau Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22.819 32.273 -0.707 0.481
lage -9.804 12.998 -0.754 0.453
------------------------------------------------------------------
No. of observations in the fit: 100
Degrees of Freedom for the fit: 8
Residual Deg. of Freedom: 92
at cycle: 7
Global Deviance: 290.2502
AIC: 306.2502
SBC: 327.0916
******************************************************************
In this second case only the variable lage
was selected by stepGAICAll.A() for modeling the tau parameter (see tau.formula = ~lage
), while in the first case lage
and lht
were selected (tau.formula = ~lage + lht
). I do not understand why this happens if all the function arguments are exactly the same for both cases. The gamlss() function also has arguments that allow to set initial values for the distribution parameters, but if they are used (e.g. mu.start=30
, sigma.start=1
, nu.start=1
, tau.start=1
) then results also differ.
I would appreciate some help or comments. Thanks a lot in advance.
P.S. I hope my explanation be understood. I apologize for my English, I am still learning.