simr (R) returning high power and huge effects

55 Views Asked by At

I am running a power analysis for a mixed effects model in R using the simr package. My estimated power seems way too high, with effect sizes that are too small. In reality my sample size is fixed, and I know a lot about the distribution of the variables in my dataset.

I have simulated the following data, which closely matches my actual dataset. I have 4,000 participants who are clustered within 40 counties. My y is ~normally distributed with a mean of 126 and sd of 16. My predictor (participant-level) is ~normally distributed with mean of 27, sd of 25, and lower bound of 0.35. Values of the participant-level predictor within a county are correlated. I am only interested in the fixed-effect of this predictor. I also simulated covariates (age, sex, household income, and education), which i will need to adjust for.

library(truncnorm)
library(tidyverse)
library(simr)
set.seed(1234)
# Simulate participants
simulated.SP <- tibble(
  y = rnorm(4000, mean=126, sd=16) # assumes outcome mean is 126 with sd of 16, and 4000 participants
)
# add county 
simulated.SP$county <- sample(1:40, size = nrow(simulated.SP), replace = TRUE)
vector.id <- c( unique(simulated.SP$county)) # simulate predictor values
count <- as.data.frame(vector.id) # data.frame of predictor values for each county

for ( i in unique(simulated.SP$county)){
  predictor <- cbind( rtruncnorm(n=40,a=0.35,b=100, mean=27, sd=23)) # predictor concentrations: mean of 27, lower value of 0.35, and upper bound of 100
  count$predictor[count$vector.id==i] <- predictor  }  

colnames(count)[colnames(count)=="vector.id"] <- "county"
simulated.SP<-merge(simulated.SP,count,by="county",all=T)

jitter_sd <- 10 # Jitter the predictor values within a county
simulated.SP$predictor <- pmax(simulated.SP$predictor  + rnorm(length(simulated.SP$predictor ), mean = 0, sd = jitter_sd), 0.35)

# Add additional variables
simulated.SP$sex <- sample(c("M", "F"), 4000, replace = TRUE)
simulated.SP$age <- rnorm(4000, mean = 50, sd = 15)
simulated.SP$householdincome <- rnorm(4000, mean = 52000, sd = 20000)
simulated.SP$education <- sample(c("low", "medium", "high"), 4000, replace = TRUE)
simulated.SP$householdincome<-scale(simulated.SP$householdincome, center = TRUE, scale = TRUE) # rescale
simulated.SP$age<-scale(simulated.SP$age, center = TRUE, scale = TRUE) # rescale
simulated.SP$education<-as.factor(simulated.SP$education)
simulated.SP$sex<-as.factor(simulated.SP$sex)
simulated.SP$county<-as.factor(simulated.SP$county)

Here are the three models I have run, all of which are returning very small effect estimates and high power:


# 1. Mean difference in y per unit change in predictor 
model.of.interest <- lmer(y ~ predictor + sex + age + householdincome + education +(1|county), data=simulated.SP)
summary(model.of.interest) # current effect estimate: -0.01365 
fixef(model.of.interest)['predictor'] # -0.01372639 
powerSim(model.of.interest, nsim=20)
#    0.00% ( 0.00, 16.84)

fixef(model.of.interest)['predictor'] <- 1 # set effect size to 1
powerSim(model.of.interest, nsim=20) 
#   100.0% (83.16, 100.0)

# 2. GMR of y per unit change in predictor:
model.of.interest <- lmer(log(y) ~ predictor + sex + age + householdincome + education +(1|county), data=simulated.SP)
summary(model.of.interest)   #-1.300e-04
fixef(model.of.interest)['predictor'] <- 0.0295588 # set effect size corresponding to GMR of 1.03 
powerSim(model.of.interest, nsim=20, test=fixed("predictor"))
#    100.0% (83.16, 100.0)

# 3. GMR of y per log-doubling of predictor:
model.of.interest <- lmer(log(y) ~ log2(predictor) + sex + age + householdincome + education +(1|county), data=simulated.SP)
summary(model.of.interest)  
fixef(model.of.interest)['log2(predictor)'] # current effect estimate:-0.001568696 ;corresponds to GMR of exp(x) = 0.9984
powerSim(model.of.interest, nsim=20, test=fixed("log2(predictor)"))
#  15.00% ( 3.21, 37.89)

fixef(model.of.interest)['log2(predictor)'] <- 0.0295588 # corresponding to GMR of 1.03 power:
powerSim(model.of.interest, nsim=20, test=fixed("log2(predictor)"))
#    100.0% (83.16, 100.0)

What am I doing wrong here?

0

There are 0 best solutions below