how does ggeffects test_predictions deal with categorical non-focal terms?

46 Views Asked by At

ggeffects has a "margin" argument in predict_response, which controls how non-focal terms are addressed when estimating predicted values, which is mostly of consequence when non-focal terms are categorical. in test_predictions, however, there is no such argument.

  1. how does test_predictions deal with categorical non-focal terms by default?
  2. is there a way to make it treat them as predict_response does when margin = "marginalmeans"?

in my case, I am fitting a logistic regression with an interaction between a numeric (year) and a categorical (parfam) variable with another categorical variable (countryname) as a control. Then I estimate predicted probabilities, and then I want to examine pairwise comparisons between these probabilities:

# Set seed for reproducibility
set.seed(123)

## create and combine three datasets, to ensure unbalanced data

# Create first dataset
data1 <- data.frame(
  childcare = sample(c(0, 1), 5000, replace = TRUE, prob = c(0.8, 0.2)),
  parfam = sample(c("agr", "con", "cd", "lib", "sd", "green", "left", "rr"), 5000, replace = TRUE),
  year = sample(1970:2022, 5000, replace = TRUE),
  countryname = sample(c("Austria", "Belgium", "Finland", "France", "Germany", "UK"), 5000, replace = TRUE)
)

# Create second dataset
data2 <- data.frame(
  childcare = sample(c(0, 1), 5000, replace = TRUE, prob = c(0.9, 0.1)),
  parfam = sample(c("agr", "con", "cd", "rr"), 5000, replace = TRUE),
  year = sample(2000:2022, 5000, replace = TRUE),
  countryname = sample(c("France", "Germany", "UK"), 5000, replace = TRUE)
)

# Create third dataset
data3 <- data.frame(
  childcare = sample(c(0, 1), 5000, replace = TRUE, prob = c(0.5, 0.5)),
  parfam = sample(c("lib", "sd", "green", "left"), 5000, replace = TRUE),
  year = sample(2000:2022, 5000, replace = TRUE),
  countryname = sample(c("Austria", "Belgium", "Finland"), 5000, replace = TRUE)
)

# Combine datasets
data <- rbind(data1, data2, data3)

# run a logistic regression

m <- glm(childcare ~ parfam*year+countryname, 
          data = data,
          family = binomial)

# get predicted probabilities using ggeffects, where margin = marginalmeans 

pred_prob <- predict_response(m, terms = c("year [1980, 2000, 2020]", "parfam"), margin = "marginalmeans")

# test pairwise comparisons

test_predictions(m, terms = c("parfam", "year [1980, 2000, 2020]"), collapse_levels = TRUE)

it seems that some of the differences reported in test_predictions do not match the differences between predicted probabilities in predict_response. see for example the predicted value in pred_prob for "left" in 1980 is 0.2097297, and the predicted value for "cd" in 1980 is 0.1639278. the difference between the two is 0.0458019. however, in the results from test_predictions, the contrast for left-cd in 1980 is 0.06.

if i drop the "margin" argument, the results do seem to match:

# get predicted probabilities using ggeffects

pred_prob <- predict_response(m, terms = c("year [1980, 2000, 2020]", "parfam"))

# test pairwise comparisons

test_predictions(m, terms = c("parfam", "year [1980, 2000, 2020]"), collapse_levels = TRUE)

moreover, when i ask for three levels of "year", it works well, but when i ask for five, i get this error:

# test pairwise comparisons

test_predictions(m, terms = c("parfam", "year [1980, 1990, 2000, 2010, 2020]"), collapse_levels = TRUE)

Error: The "pairwise", "reference", and "sequential" options of the hypotheses argument are not supported for marginaleffects commands which generate more than 25 rows of results. Use the newdata, by, and/or variables arguments to compute a smaller set of results on which to conduct hypothesis tests.

0

There are 0 best solutions below