I'm trying to loop over multiple multilevel ordered logistic regressions with random intercepts on the country
, dropping one observation at a time from the main dataset, while producing one massive, augmented tidy data frame with the results. Given that I am just having trouble with the looping, and I can produce everything one data frame at a time, I will first show what I am trying to do with two data frames. Then, I will (poorly) attempt to solve the problem with a loop, which is the part for which I would greatly appreciate your assistance.
First, let's load some libraries:
# load libraries
library(dplyr)
library(purrr)
library(tibble)
library(ordinal)
library(reprex)
library(magrittr)
Next, let's create some sample data:
# create original df in the right format
data0 = data.frame(country = rep(c("Algeria","Belgium","Canada","Denmark","England", "France"),times=10),
x1 = rep(0:1),times=30,
x2 = rnorm(n = 60, mean = 100, sd = 5),
y = rep(1:6),times=10)
data0$country = factor(data0$country)
data0$y = factor(data0$y)
data0 <- data0 %>% dplyr::select(country,x1,x2,y)
Building on this post, I'll create a custom tidier for clmm2
models:
tidy.clmm2 <-
function(fit){
results = as.data.frame(coefficients(summary(fit)))
colnames(results) = c("estimate","std.error","statistic","p.value")
results %>% tibble::rownames_to_column("term")
}
Estimate the first model, tidy it, and augment it with confidence intervals, odds ratios, and model observation number:
# model 1: get the dataset with one less observation, removing the top row
data1 <- data0 %>% dplyr::slice(-1)
# model 1: estimate
m1 <- clmm2(y ~ x1 + x2, random=country, data=data1, Hess = TRUE)
summary(m1)
# model 1: get odds ratio and renames columns for model 1
or1 <- as.data.frame(exp(coef(m1)))
or1 <- or1 %>% rename(estimate.odds =1)
or1$term <- row.names(or1)
# model 1: tidy the model, get the CIs, and store the observation/model number
tidy_m1 <- tidy.clmm2(m1)
tidy_m1$obs <- m1$nobs[[1]]
tidy_m1$conf.low <- tidy_m1$estimate - (1.96 * tidy_m1$std.error)
tidy_m1$conf.high <- tidy_m1$estimate + (1.96 * tidy_m1$std.error)
# model 1: merge over the odds ratios and make final data
tidy_m1 <- left_join(tidy_m1, or1, by=c("term"))
Do the same for model 2, and bind the final output with model 1:
# model 2: get the dataset with one less observation, removing the top row
data2 <- data1 %>% dplyr::slice(-1)
# model 2: estimate
m2 <- clmm2(y ~ x1 + x2, random=country, data=data2, Hess = TRUE)
summary(m2)
# model 2: get odds ratio and renames columns for model 1
or2 <- as.data.frame(exp(coef(m2)))
or2 <- or2 %>% rename(estimate.odds =1)
or2$term <- row.names(or2)
# model 2: tidy the model, get the CIs, and store the observation/model number
tidy_m2 <- tidy.clmm2(m2)
tidy_m2$obs <- m2$nobs[[1]]
tidy_m2$conf.low <- tidy_m2$estimate - (1.96 * tidy_m2$std.error)
tidy_m2$conf.high <- tidy_m2$estimate + (1.96 * tidy_m2$std.error)
# model 2: merge over the odds ratios and make final data
tidy_m2 <- left_join(tidy_m2, or2, by=c("term"))
## Bind everything together to get a final data frame
tidy_final <- dplyr::bind_rows(tidy_m1,tidy_m2)
Here is my (poor) attempt at a loop, building on this excellent post:
# create vector to store observation/data frame numbers
vector <- 1:59
# start of model
df_final <- purrr::map_dfr(1:59,
function(i) data.frame(model = i,
tidy.clmm2(glm(as.formula(paste0('y ~ ', i)),
random=country,
Hess = TRUE,
data = data[vector]))))
How do I fix the above in line with what I was doing one model at a time? Don't worry about the NaN
s, as I have only provided a highly-stylized example to facilitate responses.
First I would improve your tidier function to do all of the extra tidying you were doing for each model (I'm assuming a tidier for this model class doesn't already exist in some package or other):
I don't know
purrr
, but you can then use the base functionlapply
(which I guess is similar) to estimate the model and the get the tidy parameters on successively smaller datasets, befor passing the results tobind_rows
. Here I only do it for the first 10: