Why results of stepGAICAll.A() function of GAMLSS package are different when it is used within a for-loop?

130 Views Asked by At

I would like to know why when I use stepGAICAll.A() function within a for-loop gives different results compared with those obtained when I use it without a for-loop (of course, setting the same function arguments).

The data used for explaining the problem can be loaded after gamlss package is loaded:

library(gamlss)
data('dbhh')
df <- dbhh[1:100,] # this is a subset for simplifying calculations
dfm <- transform(df, lage=log(age), lht=log(ht))
head(dfm)

The new data frame dfm is as follows:

  head  age   ht      lage      lht
1 33.6 0.03 53.3 -3.506558 3.975936
2 33.6 0.04 50.8 -3.218876 3.927896
3 33.7 0.04 50.1 -3.218876 3.914021
4 35.0 0.04 53.5 -3.218876 3.979682
5 36.1 0.04 52.1 -3.218876 3.953165
6 36.6 0.05 50.5 -2.995732 3.921973

First case: using stepGAICAll.A() without a for-loop

If I specify one distribution (in this case “BCTo”) and apply stepGAICAll.A() function:

options(warn=1)
nC <- 4 # this is the number of CPU cores
m0_1 <- gamlss(head ~ 1, data=dfm, family=BCTo, n.cyc=800) #mu.start=30, sigma.start=1, nu.start=1, tau.start=1
model_1 <- stepGAICAll.A(m0_1, 
                          scope=list(lower = ~ 1, upper = ~ lage + lht),  
                          k=2, parallel='multicore', ncpus=nC)

the results (after executing summary(model_1)) are as follows:

******************************************************************
Family:  c("BCTo", "Box-Cox-t-orig.") 

Call:  gamlss(formula = head ~ lht + lage, sigma.formula = ~lht,  
    nu.formula = ~1, tau.formula = ~lage + lht, family = BCTo,  
    data = dfm, n.cyc = 800, trace = FALSE) 

Fitting method: RS() 

------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.84363    0.12370   14.90  < 2e-16 ***
lht          0.48800    0.03058   15.96  < 2e-16 ***
lage         0.06683    0.01208    5.53 3.04e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
Sigma link function:  log
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   24.208     11.098   2.181   0.0317 *
lht           -7.007      2.797  -2.506   0.0140 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
Nu link function:  identity 
Nu Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -5.857      4.086  -1.434    0.155

------------------------------------------------------------------
Tau link function:  log 
Tau Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   71.058     71.051   1.000    0.320
lage          -7.393      6.707  -1.102    0.273
lht          -21.912     16.311  -1.343    0.183

------------------------------------------------------------------
No. of observations in the fit:  100 
Degrees of Freedom for the fit:  9
      Residual Deg. of Freedom:  91 
                      at cycle:  8 
 
Global Deviance:     287.9476 
            AIC:     305.9476 
            SBC:     329.3941 
******************************************************************

Second case: using stepGAICAll.A() within a for-loop

If I try the same for a group of distributions, including “BCTo”

pdf_2 <- c('BCTo', 'BCPEo')
m0_2 <- list()
model_2 <- list()
n_2 <- length(pdf_2)
for (i in 1:n_2) {
    m0_2[[pdf_2[i]]] <- gamlss(head ~ 1, data=dfm, family=pdf_2[i], n.cyc=800) #mu.start=30, sigma.start=1, nu.start=1, tau.start=1
    model_2[[pdf_2[i]]] <- stepGAICAll.A(m0_2[[pdf_2[i]]], 
                          scope = list(lower = ~ 1, 
                                       upper = ~ lage + lht), 
                          k=2, parallel='multicore', ncpus=nC)
}

then results (after executing summary(model_2[['BCTo']])) differ regarding the previously calculated using only “BCTo”:

******************************************************************
Family:  c("BCTo", "Box-Cox-t-orig.") 

Call:  gamlss(formula = head ~ lht + lage, sigma.formula = ~lht,  
    nu.formula = ~1, tau.formula = ~lage, family = pdf_2[i],  
    data = dfm, n.cyc = 800, trace = FALSE) 

Fitting method: RS() 

------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.82189    0.66376   2.745 0.007281 ** 
lht          0.49403    0.15754   3.136 0.002301 ** 
lage         0.06760    0.01841   3.671 0.000405 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
Sigma link function:  log
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   16.743     11.644   1.438   0.1539  
lht           -5.129      2.930  -1.750   0.0834 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

------------------------------------------------------------------
Nu link function:  identity 
Nu Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -6.498     10.255  -0.634    0.528

------------------------------------------------------------------
Tau link function:  log 
Tau Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -22.819     32.273  -0.707    0.481
lage          -9.804     12.998  -0.754    0.453

------------------------------------------------------------------
No. of observations in the fit:  100 
Degrees of Freedom for the fit:  8
      Residual Deg. of Freedom:  92 
                      at cycle:  7 
 
Global Deviance:     290.2502 
            AIC:     306.2502 
            SBC:     327.0916 
******************************************************************

In this second case only the variable lage was selected by stepGAICAll.A() for modeling the tau parameter (see tau.formula = ~lage), while in the first case lage and lht were selected (tau.formula = ~lage + lht). I do not understand why this happens if all the function arguments are exactly the same for both cases. The gamlss() function also has arguments that allow to set initial values for the distribution parameters, but if they are used (e.g. mu.start=30, sigma.start=1, nu.start=1, tau.start=1) then results also differ.

I would appreciate some help or comments. Thanks a lot in advance.

P.S. I hope my explanation be understood. I apologize for my English, I am still learning.

0

There are 0 best solutions below