I'm trying to mimic an experiment within a dataframe, running numerous simulations with t-tests to extract significance values.
I created the "treatment" variable within my dataset to randomly sort observations into the treatment group (1) or control group (0). I then ran a t-test testing the treatment variable against "ideology," a continuous variable within my dataset, and tried to extract the difference between the groups and the p-values.
set.seed(17800)
differences <- c()
p_values <- c()
for (i in 1:5000) {
test <- experiment_pool %>%
mutate(treatment = sample(rep(c(0,1), length = n())))
t_test_result <- t.test(x = test$Ideology, y = test$treatment)
differences[i] <- diff(t_test_result$estimate)
p_values[i] <- t_test_result$p.value
}
results <- data.frame(Difference = differences, PValue = p_values)%>%
mutate(sig_diff = if_else(p_value < 0.5, 1, 0))
My resulting code has constant values for the "Difference" and "PValue" variables. I checked that the "treatment" sampling has good distribution, so I think the problem is in the t-test step.
Since you didn't upload code about what your variable
experiment_pool
looks like, I rewrote the code so that it would achieve what I think you wanted to achieve (please correct me if this was not what you were looking for).Since a t-test is based on the general linear model where we tend to assume that our predictor variables (here the treatment) are fixed variables and only the y-variables (here ideology) are random variables, the only thing that really needs to be varied per replication are the values of the y-variable. In my code, I generated a treatment variable where the groups are equal, but you could of course test scenarios with different group sizes as well.
One then needs to pick a population value of how strongly the treatment variable is related to Ideology, i.e. how strong the population difference in ideology between treatment and control group is. I used the package
faux
because one can create bivariately associated variables via a correlation coefficient, which I find rather handy.I think the main problem in your code was the way you defined the t-test, since when writing it as
t.test(x = x_variable, y = y_variable)
R thinks you have two continuous variables and tries to compare the mean ofx_variable
to the mean ofy_variable
, which leads to wrong results, when in reality the means you want to compare are the mean of group 1 of thex_variable
(e.g mean Ideology of treatment group) to the mean of group 2 of thex_variable
(e.g. mean Ideology of control group). Thus you have to writet.test(y_variable ~ x_variable)
.Lastly, if you really wanted "Ideology" to be your x and "treatment" (dichotomous) to be your y, a t-test is not the right model and you should instead choose something like a logistic regression.