partykit::mob() + mlogit: `Error no suitable fitting function specified`

250 Views Asked by At

I am trying to fit a conditional logit using mlogit::mlogit() at the end leaves of the tree generated by the MOB algorithm partykit::mob(). Apparently, it cannot be made directly using the partykit::mob() function (below my attempts). However, I found the LORET algorithm, but I couldn't find any documentation with examples, so I tried guessing which function I need from the source code, but unfortunately, I couldn't make it work.

Do you know how (1) where I could find examples for the LORET library and (2) if it is possible to use the partykit:mob() function to work together with mlogit::mlogit? Thanks in advance.


Example data

For illustration, please gently consider the following data. It represents data from 5 individuals (id_ind ) that choose among 3 alternatives (altern). Each of the five individuals chose three times; hence we have 15 choice situations (id_choice). Each alternative is represented by two generic attributes (x1 and x2), and the choices are registered in y (1 if selected, 0 otherwise). Finally, z1 is a candidate partition variable.

df <- read.table(header = TRUE, text = "
id_ind id_choice altern           x1          x2 y
1       1         1      1  1.586788801  0.11887832 1
2       1         1      2 -0.937965347  1.15742493 0
3       1         1      3 -0.511504401 -1.90667519 0
4       1         2      1  1.079365680 -0.37267925 0
5       1         2      2 -0.009203032  1.65150370 1
6       1         2      3  0.870474033 -0.82558651 0
7       1         3      1 -0.638604013 -0.09459502 0
8       1         3      2 -0.071679538  1.56879334 0
9       1         3      3  0.398263302  1.45735788 1
10      2         4      1  0.291413453 -0.09107974 0
11      2         4      2  1.632831160  0.92925495 0
12      2         4      3 -1.193272276  0.77092623 1
13      2         5      1  1.967624379 -0.16373709 1
14      2         5      2 -0.479859282 -0.67042130 0
15      2         5      3  1.109780885  0.60348187 0
16      2         6      1 -0.025834772 -0.44004183 0
17      2         6      2 -1.255129594  1.10928280 0
18      2         6      3  1.309493274  1.84247199 1
19      3         7      1  1.593558740 -0.08952151 0
20      3         7      2  1.778701074  1.44483791 1
21      3         7      3  0.643191170 -0.24761157 0
22      3         8      1  1.738820924 -0.96793288 0
23      3         8      2 -1.151429915 -0.08581901 0
24      3         8      3  0.606695064  1.06524268 1
25      3         9      1  0.673866953 -0.26136206 0
26      3         9      2  1.176959443  0.85005871 1
27      3         9      3 -1.568225496 -0.40002252 0
28      4        10      1  0.516456176 -1.02081089 1
29      4        10      2 -1.752854918 -1.71728381 0
30      4        10      3 -1.176101700 -1.60213536 0
31      4        11      1 -1.497779616 -1.66301234 0
32      4        11      2 -0.931117325  1.50128532 1
33      4        11      3 -0.455543630 -0.64370825 0
34      4        12      1  0.894843784 -0.69859139 0
35      4        12      2 -0.354902281  1.02834859 0
36      4        12      3  1.283785176 -1.18923098 1
37      5        13      1 -1.293772990 -0.73491317 0
38      5        13      2  0.748091387  0.07453705 1
39      5        13      3 -0.463585127  0.64802031 0
40      5        14      1 -1.946438667  1.35776140 0
41      5        14      2 -0.470448172 -0.61326604 1
42      5        14      3  1.478763383 -0.66490028 0
43      5        15      1  0.588240775  0.84448489 1
44      5        15      2  1.131731049 -1.51323232 0
45      5        15      3  0.212145247 -1.01804594 0
")
df$z1 <- rnorm(n= nrow(df),mean = 0,sd = 1)

mlogit + partykit::mob()

library(mlogit)
library(partykit)
mo <-  mlogit(formula =  y ~ x1 + x2 , 
              data =  df,
              idx  =  c("id_choice", "altern"))
# Coefficients:
#   (Intercept):2  (Intercept):3             x1             x2  
#        0.036497       0.293254       0.821173       1.062794

mlogit_function <-  function(y, x,
                             offset = NULL,
                             ...){ mlogit(y ~  x ,
                                          data =  df)}
formula <-  y ~ x1 + x2 | z1 
mob(formula = formula,
    data    = df,
    fit     = mlogit_function,
    control = mob_control(minsize = 10, 
                          alpha = 0.01))
# Error in mob(formula = formula, data = df, fit = mlogit_function, control = mob_control(minsize = 10,  
# no suitable fitting function specified

mlogit + loret::multinomtree()

This function runs the tree, but it is not what I want because there is a missing constant for alternative 2.

loret::multinomtree(formula = formula,
                    data = df)
# Model-based recursive partitioning (NULL)
# Model formula:
#   y ~ x1 + x2 | z1
# 
# Fitted party:
#   [1] root: n = 45
# 1:(Intercept)          1:x1          1:x2 
# -1.1046317     0.7663315     1.0418296  
# 
# Number of inner nodes:    0
# Number of terminal nodes: 1
# Number of parameters per node: 3
# Objective function: 22.62393

mlogit + loret::clmtree()

loret::clmtree(formula = formula,
               data = df)
# Error in clm.fit.default(y = c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L,  : 
#                                  is.list(y) is not TRUE
2

There are 2 best solutions below

6
On BEST ANSWER

I don't have a complete solution but hopefully enough feedback to get you started:

  • It is important to distinguish so-called "conditional logit" models with alternative-specific variables (which you are interested in) and classic "multinomial logit models" with only subject-specific (or individual-specific) variables. mlogit::mlogit() can fit both (and also mixed versions) while nnet::multinom() only supports the latter.

  • For fitting conditional logit models you could have the data in long vs. wide form. You specify your data in long form and also have a splitting variable z1 that is alternative-specific. This means that data from the same individual could end up in different nodes of the tree which would be rather awkward.

  • Instead it would be better to have the data in wide form so that each row corresponds to an individual and then you could only consider individual-specific variables for splitting. This would also match the view of the $gradient element of a fittedmlogit object which provides the individual-specific gradient contributions. (This is what sandwich::estfun() extracts which in turns is the essential information for partykit::mob().)

  • It might also be possible to do sensible recursive partitioning based on alternative-specific variables but I find it hard to see what kind of models this would yield and what these would mean. In any case, you then would have to write your own code to extract the estfun from the fitted-model object that provides the fully disaggregated alternative-specific gradient contributions.

  • The loret package is somewhat unfinished and hasn't been updated in quite a while. Hence, I would not recommend using it "in production" at the moment. Also nnet::multinom() (underlying loret::multinomtree()) does not fit the model you need (as mentioned above) and ordinal::clm() (underlying loret::clmtree()) is for a completely different model.

  • One specific aspect that we wanted to build into loret but did not finish yet, is automatic detection of (quasi-)complete separation in logistic models and handling it appropriately in the tree.

  • Your mlogit + partykit::mob() approach does not work because the fitting function does not have the right interface (as you are correctly informed). See vignette("mob", package = "partykit") for the two supported interfaces.

  • To write an appropriate interface you need to make sure that you have all the variables you need in each subset. Note that the response variable plus regressor matrix is not enough for this but you need the index variables as well! I would recommend to include these variables either through the y variables or the x variables of the formula specified for partykit::mob(). In mob_control() you can then set ytype = "data.frame" and xtype = "data.frame". Then both y and x are provided as data.frame objects and can be combined again prior to calling mlogit::mlogit(). The formula and idx arguments for mlogit() have to be provided in some way then. In the example below I have hard-coded them.

Illustration based on your example: You can set up a model-fitting function my_fit() that expects y and x to be data frames and then uses the formula = y ~ x1 + x2 and idx = c("id_choice", "altern") to fit the mlogit() model.

my_fit <- function(y, x = NULL, start = NULL, weights = NULL, offset = NULL, ...) {
  mlogit::mlogit(formula = y ~ x1 + x2, data = cbind(y, x), idx = c("id_choice", "altern"))
}

To specifify the tree you then need to pass all required variables for the mlogit() model through the x variables of the mob() formula:

tr <- mob(y ~ x1 + x2 + id_choice + altern | z1, data = df, fit = my_fit,
  control = mob_control(ytype = "data.frame", xtype = "data.frame"))

Superficially, this works. At least it fits the desired model in the root node as you can see by checking coef(tree). However, the only reason it works is that it does not even try to do any partitioning as the sample size is judged to be too small compared to the number of parameters.

But when you additionally set minsize = 10 in mob_control() then the partitioning process will fail. The reason for this is that the splitting variable is of length 45 but the gradient/estfun from mlogit() is only of length 15. This is the long alternative-specific format vs. the short individual-specific format.

Thus, to make things work you have to use the data in wide form and then adapt the mob() formula and the mlogit() call inside my_fit() accordingly.

0
On

Data preparation

Here's a workaround to make work mlogit::mlogit together with partykit::mob() function based on Achim's answer. First, we need to use the data in the wide format to properly parse the score functions when the partykit::mob() function extracts it using estfun. Additionally, I generated a partition variable (z1) at the level of individuals (which was a mistake I made in my original question (!)).


# First generate the choice variable in the wide format
df$y_wide<- with(df, 
                 ave(x = altern * y , 
                     by = id_choice, 
                     FUN = max))
# drop the long formate choice variable
df <- subset( df, select = -y )
# reshape the data.
df_wide <- reshape(df, 
                   idvar = "id_choice", 
                   timevar = "altern", 
                   direction = "wide",
                   v.names = c("x1", "x2"))
# Add the partition variable
set.seed(777)
# Here I am generating a partition variable at the individual level.
# I made a mistake in my original question generating the partition variable 
# at the choice situation level.
df_wide$z1 <- rep(rnorm(max(df_wide$id_ind)),each = 3)


head(df_wide)
#    id_ind id_choice y_wide        x1.1        x2.1         x1.2       x2.2       x1.3       x2.3         z1
# 1       1         1      1  1.58678880  0.11887832 -0.937965347  1.1574249 -0.5115044 -1.9066752  0.4897862
# 4       1         2      2  1.07936568 -0.37267925 -0.009203032  1.6515037  0.8704740 -0.8255865  0.4897862
# 7       1         3      3 -0.63860401 -0.09459502 -0.071679538  1.5687933  0.3982633  1.4573579  0.4897862
# 10      2         4      3  0.29141345 -0.09107974  1.632831160  0.9292550 -1.1932723  0.7709262 -0.3985414
# 13      2         5      1  1.96762438 -0.16373709 -0.479859282 -0.6704213  1.1097809  0.6034819 -0.3985414
# 16      2         6      3 -0.02583477 -0.44004183 -1.255129594  1.1092828  1.3094933  1.8424720 -0.3985414

Implementing the MOB algorithm together with mlogit.

1st stage: The fit function my_fit_for_mlogit().

Tweaking the fit function supplied to partykit::mob, we present the data in wide format, but we "format it" to be used by mlogit::mlogit() by using the dfidx() function. This way to proceed means that there is a hidden stats::reshape() process working behind the scenes, so if the data set is huge, this will probably slow down the overall process.

library(mlogit)
library(partykit)
#### First define the function sketched by Achim in his comment.
my_fit_for_mlogit <- function(y, 
                              x = NULL, 
                              start = NULL, 
                              weights = NULL, 
                              offset = NULL, ...) {

  #First declare dfidx data 
  xy <- cbind(y,x)
  # Properly declare the data to be used by mlogit.
  d <- dfidx(data    = xy, 
             shape   = "wide", 
             choice  = "y_wide",
             varying = 2:7,
             sep     = ".",
             idx     = list(c("id_choice", "id_ind")), 
             idnames = c(NA, "altern"))
  
  # fit mlogit using data equal "d"
  model <- mlogit::mlogit(formula = y_wide ~ x1 + x2|1 ,
                         data    =  d)

    return(model)
}

2nd stage: Using the partykit::mob() function

Below we invoke the my_fit_for_mlogit() inside of the mob() function, using a very large alpha (alpha = 0.99) just for the sake of illustration. Here is important to note that we have to include all the explanatory variables in the wide format (e.g x1.1 + x2.1 + x1.2) together with the indexing variables, choice situation variable (id_choice), and individuals index variable (id_ind) and the partition variable (z1) is placed after the | symbol.

# the trick is to include all the variables in wide format and then
# let dfidx() do the magic INSIDE the my_fit_for_mlogit() function. 


tr <- mob(formula = y_wide ~  
                    x1.1 + x2.1 + x1.2 + 
                    x2.2 + x1.3 + x2.3 + 
                    id_choice + id_ind + 0 | z1, 
          data    = df_wide, 
          fit     = my_fit_for_mlogit,
          cluster = id_ind, 
          control = mob_control(ytype = "data.frame", 
                                xtype = "data.frame",
                                alpha = 0.99, # This is just for illustration
                                minsize = 6))

# Model-based recursive partitioning (my_fit_for_mlogit)
# 
# Model formula:
#   y_wide ~ x1.1 + x2.1 + x1.2 + x2.2 + x1.3 + x2.3 + id_choice + 
#   id_ind + 0 | z1
# 
# Fitted party:
#   [1] root
# |   [2] z1 <= 0.48979: n = 9
# |       (Intercept):2 (Intercept):3            x1            x2 
# |          -3.3989117     0.3696964     0.9647511     2.5129869 
# |   [3] z1 > 0.48979: n = 6
# |       (Intercept):2 (Intercept):3            x1            x2 
# |            45.77874     -11.51773      21.72540      31.29221 
# 
# Number of inner nodes:    1
# Number of terminal nodes: 2
# Number of parameters per node: 4
# Objective function: 4.605233