How can I estimate a latent class logit model in R?

908 Views Asked by At

I am new to using R. I am trying to estimate a latent class logit model using panel data. I tried following this example: https://rpubs.com/msarrias1986/335556.
I was told that the following code should work:

df01 <- mlogit.data(data, 
                      id = "ID", 
                      choice = "Choice", 
                      varying = 3:17, 
                      shape = "wide", 
                      sep = "")

lc <- gmnl(Choice ~ COST + REN + NUCL + OUTAGE180 + OUTAGE360 | 0 | 0 | 0 | 1 , 
           data = df01,
           model = 'lc', 
           Q = 3, 
           panel = TRUE,
           method = "bhhh")

With a basic datafile of 17 columns (see image), it works. However, when I add one more column, for example a dummy variable for gender, I get 2 errors:

  1. in the first command, I get the error "Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, : 'varying' arguments must be the same length". I noticed that I can get rid of the error by stating 'varying = list(3:18)' instead of 'varying = 3:18', but I'm not sure if this is a correct way to deal with it.

  2. in the second command, I get the error "Error in eval(predvars, data, env) : object 'COST' not found". 'COST' is indeed not a variable, but 'COST_1' (i.e. the cost of the first alternative), 'COST_2' and 'COST_3' are. I want the coefficient for 'COST' to represent the importance of costs in choosing an alternative. This is similar for all other variables.

I find it curious that just adding 1 column to the datafile causes these errors. I hope someone has some good advice. Thanks for helping!

(example of my data in the included image).

enter image description here

1

There are 1 best solutions below

0
On

I kept the command 'varying = 3:17' and changed the code to:

df01 <- mlogit.data(data, 
                      id = "ID", 
                      choice = "Choice",
                      varying = 3:17, 
                      shape = "wide", 
                      sep = "",
                      alt.levels = c("FOSS","REN","NUCL","COST","OUTAGE"))

lc <- gmnl(Choice ~ COST + REN + NUCL + OUTAGE | MALE | 0 | 0 | 1 , 
           data = df01,
           model = 'lc', 
           Q = 3, 
           panel = TRUE,
           method = "bhhh")

For less than 13 individual variables, this seems to work.