multinomial logit

585 Views Asked by At

I'm stuck with running a multinomial logit regression in R. The data preview is attached for the reference. How should I run it? I'm new to R, and need to do this for applied econometrics using R. Can you help me with reshaping data and running multinomial regression?

> head(data)
  marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3 x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice
1           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      3
2           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
3           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      3
4           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
5           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
6           1 7.459917        1 7.267866  6.67054        1 7.633743 8.444682        0 11.30016        0        0        0      2
3

There are 3 best solutions below

5
On BEST ANSWER

Running multinomial logit model in R can be done in several packages, including multinom package and mlogit package. The tutorial at UCLA website recommended by mhmtsrmn prefers multinom to mlogit

because it does not require the data to be reshaped (as the mlogit package does)

However, the data you provided have been in a shape compatible with the format required by mlogit package, so in case you want to use mlogit, you don't need reshaping anymore. Nevertheless, you do need to change the coding in the choice column as follows:

  • Choice 2 must be changed to prod2
  • Choice 3 must be changed to prod3, and so on.

This is necessary because in the other columns you use prod2, prod3, etc.

I tried to run mlogit function to your data sample, but it failed, most probably because this sample doesn't have enough variation in its values. So I change the values to random values and assigned the data frame to choice_dat name, like this:

choice_dat
 marketindex x1_prod1 x2_prod1 x3_prod1 x1_prod2 x2_prod2 x3_prod2 x1_prod3
1           1        5        7        6        5        2        8        7
2           1        8        3        5        6        3        9        8
3           1        7       10        3        7        6        9        9
4           1        8        8        2        5        8        9        7
5           1        9        9       10        8        4        6        8
6           1        7        4        8        7       10       10        8
  x2_prod3 x3_prod3 x1_prod0 x2_prod0 x3_prod0 choice1
1       10       13        0        0        0   prod3
2        3       10        0        0        0   prod2
3        4       10        0        0        0   prod3
4        1       11        0        0        0   prod2
5        8       10        0        0        0   prod2
6        5       12        0        0        0   prod2

Then, I run mlogit to the data:

prod_dat <- dfidx(choice_dat, choice = "choice1", varying = c(2:13), sep = "_")
mod1<- mlogit(choice1 ~ x1 + x2 + x3|0, data = prod_dat)
summary(mod1)

Call:
mlogit(formula = choice1 ~ x1 + x2 + x3 | 0, data = prod_dat, 
    method = "nr")

Frequencies of alternatives:choice
  prod0   prod1   prod2   prod3 
0.00000 0.00000 0.66667 0.33333 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 9.53E-08 
gradient close to zero 

Coefficients :
   Estimate Std. Error z-value Pr(>|z|)
x1 -0.11412    0.38947 -0.2930   0.7695
x2  0.16461    0.17790  0.9253   0.3548
x3  0.26768    0.22651  1.1818   0.2373

Log-Likelihood: -5.8257
1
On

Here is a link to multinomial logistics regression example in R using multinom from nnet package by UCLA. The formula format looks like the same as base R's lm function.

0
On

Here's a multinom(...) example, using your data.

library(data.table)
library(nnet)
setDT(data)
##
#   first method
#
data[
  , c('x1', 'x2', 'x3'):=mget(sapply(1:3, function(x) sprintf('x%d_prod%d', x, choice)))
  , by=.(1:nrow(data))]
fit.1 <- multinom(choice ~ x1 + x2 + x3, data)
fit.1
## Call:
## multinom(formula = choice ~ x1 + x2 + x3, data = data)
## 
## Coefficients:
## (Intercept)          x1          x2          x3 
##   -3.420470   -6.949344  -12.363971    6.679612 
##
## Residual Deviance: 0.0001212278 
## AIC: 4.000121 
##
#   alternate method
#
data.melt <- melt(data, measure.vars = patterns('_prod'))
data.melt[, prod.id:=gsub('^.+_prod(\\d+)$', '\\1',variable)]
data.melt[, variable:=gsub('^(.+)_.+$', '\\1', variable)]
data.melt <- data.melt[choice==prod.id]
data.melt[, id:=seq(.N), by=.(variable, choice)]
mf <- dcast(data.melt, marketindex+choice+id~variable, value.var = 'value')
fit.2 <- multinom(choice ~ x1+x2+x3, mf)
fit.2
## Call:
## multinom(formula = choice ~ x1 + x2 + x3, data = mf)
##
## Coefficients:
## (Intercept)          x1          x2          x3 
##   -3.420470   -6.949344  -12.363971    6.679612 
##
## Residual Deviance: 0.0001212278 
## AIC: 4.000121