I am running a power analysis for a mixed effects model in R using the simr package. My estimated power seems way too high, with effect sizes that are too small. In reality my sample size is fixed, and I know a lot about the distribution of the variables in my dataset.
I have simulated the following data, which closely matches my actual dataset. I have 4,000 participants who are clustered within 40 counties. My y is ~normally distributed with a mean of 126 and sd of 16. My predictor (participant-level) is ~normally distributed with mean of 27, sd of 25, and lower bound of 0.35. Values of the participant-level predictor within a county are correlated. I am only interested in the fixed-effect of this predictor. I also simulated covariates (age, sex, household income, and education), which i will need to adjust for.
library(truncnorm)
library(tidyverse)
library(simr)
set.seed(1234)
# Simulate participants
simulated.SP <- tibble(
y = rnorm(4000, mean=126, sd=16) # assumes outcome mean is 126 with sd of 16, and 4000 participants
)
# add county
simulated.SP$county <- sample(1:40, size = nrow(simulated.SP), replace = TRUE)
vector.id <- c( unique(simulated.SP$county)) # simulate predictor values
count <- as.data.frame(vector.id) # data.frame of predictor values for each county
for ( i in unique(simulated.SP$county)){
predictor <- cbind( rtruncnorm(n=40,a=0.35,b=100, mean=27, sd=23)) # predictor concentrations: mean of 27, lower value of 0.35, and upper bound of 100
count$predictor[count$vector.id==i] <- predictor }
colnames(count)[colnames(count)=="vector.id"] <- "county"
simulated.SP<-merge(simulated.SP,count,by="county",all=T)
jitter_sd <- 10 # Jitter the predictor values within a county
simulated.SP$predictor <- pmax(simulated.SP$predictor + rnorm(length(simulated.SP$predictor ), mean = 0, sd = jitter_sd), 0.35)
# Add additional variables
simulated.SP$sex <- sample(c("M", "F"), 4000, replace = TRUE)
simulated.SP$age <- rnorm(4000, mean = 50, sd = 15)
simulated.SP$householdincome <- rnorm(4000, mean = 52000, sd = 20000)
simulated.SP$education <- sample(c("low", "medium", "high"), 4000, replace = TRUE)
simulated.SP$householdincome<-scale(simulated.SP$householdincome, center = TRUE, scale = TRUE) # rescale
simulated.SP$age<-scale(simulated.SP$age, center = TRUE, scale = TRUE) # rescale
simulated.SP$education<-as.factor(simulated.SP$education)
simulated.SP$sex<-as.factor(simulated.SP$sex)
simulated.SP$county<-as.factor(simulated.SP$county)
Here are the three models I have run, all of which are returning very small effect estimates and high power:
# 1. Mean difference in y per unit change in predictor
model.of.interest <- lmer(y ~ predictor + sex + age + householdincome + education +(1|county), data=simulated.SP)
summary(model.of.interest) # current effect estimate: -0.01365
fixef(model.of.interest)['predictor'] # -0.01372639
powerSim(model.of.interest, nsim=20)
# 0.00% ( 0.00, 16.84)
fixef(model.of.interest)['predictor'] <- 1 # set effect size to 1
powerSim(model.of.interest, nsim=20)
# 100.0% (83.16, 100.0)
# 2. GMR of y per unit change in predictor:
model.of.interest <- lmer(log(y) ~ predictor + sex + age + householdincome + education +(1|county), data=simulated.SP)
summary(model.of.interest) #-1.300e-04
fixef(model.of.interest)['predictor'] <- 0.0295588 # set effect size corresponding to GMR of 1.03
powerSim(model.of.interest, nsim=20, test=fixed("predictor"))
# 100.0% (83.16, 100.0)
# 3. GMR of y per log-doubling of predictor:
model.of.interest <- lmer(log(y) ~ log2(predictor) + sex + age + householdincome + education +(1|county), data=simulated.SP)
summary(model.of.interest)
fixef(model.of.interest)['log2(predictor)'] # current effect estimate:-0.001568696 ;corresponds to GMR of exp(x) = 0.9984
powerSim(model.of.interest, nsim=20, test=fixed("log2(predictor)"))
# 15.00% ( 3.21, 37.89)
fixef(model.of.interest)['log2(predictor)'] <- 0.0295588 # corresponding to GMR of 1.03 power:
powerSim(model.of.interest, nsim=20, test=fixed("log2(predictor)"))
# 100.0% (83.16, 100.0)
What am I doing wrong here?