I have a dataset with multiple observations per customer and want to do a multinominal regression in R. To account for the multiple observations per customer, I need to check for clustered standard errors.
To do this I use the mlogit
and clusterSEs
package.
My first step was to transfer my original dataframe into a wide one:
mlMASTER_DATA <- mlogit.data(MASTER_DATA, shape = "wide", choice="Booking_status")
After that I created my model:
mnlModel_P3 <- mlogit(Booking_status ~ 1 | logprevious_bookings + logsearches,
data=mlMASTER_DATA, reflevel = "is_booked_24h", na.action = na.exclude)
The model runs normally.
In a third step, I want to account for standard errors:
Cluster_model <- cluster.im.mlogit(mnlModel_P3, mlMASTER_DATA, ~user_id)
However, I get the following error message:
error in `[.data.frame`(as.data.frame(x), i, j, drop = drop) :
undefined columns selected
Can anybody help on this issue? Many thanks!!
You may need to convert your data.frame using the dfidx package, which embeds an index with observation and cluster details into your data frame, before using cluster.im.logit().
For an example, first run
help(cluster.im.logit)
, then in the help quadrant scroll to bottom and look at the examples.Here's the one I would start with:
This pivots the H df to wide format but also includes an idx column. idx is really a two-part index by observation and cluster (as I understand it). You can then run your mlogit with this new df:
Next run cluster.im.logit() on the model output and df with your specified cluster (which is embedded in idx). Per the help file again: