I'm teaching a unit on simulation based inference (single proportion test) using the mosaic package in R, and I'm having trouble getting R to count the number of simulations in which the simulated proportion is greater than or equal to the observed data.
Specifically, I have a problem where 600 people are surveyed as to whether they plan to light fireworks on the Fourth of July, and 56% respond that they do. I want to test that this proportion is greater than a null hypothesis of p_0=0.5. Here's my current code:
library(mosaic)
set.seed(25)
# Set up a data frame of 600 responses, 56% of which are "Fireworks", remaining "No Fireworks"
FireworksData <- c(rep("Fireworks",600*.56),rep("No Fireworks",600*(0.44)))
Fireworks_df <- data.frame('FourthOfJulyPlans'=FireworksData)
# Simulation: Run 1000 resamples of the data, and calculate proportion that are "Fireworks"
Fireworks.Null <- do(1000)*(prop( ~ FourthOfJulyPlans,
data = resample(Fireworks_df),
success="Fireworks")+(0.50-0.56))
# The +(0.50-0.56) at the end centers this distribution at p_0=0.5
#Count simulated proportions that meet or exceed 0.56, and calculate p-value
count(Fireworks.Null>=.56)
count(Fireworks.Null>=.56)/simulations
The problem is that count(Fireworks.Null>=.56) only finds the values that are greater than 0.56, but not equal: it returns that there are 2 cases that meet or exceed 0.56, but in fact there are 3 with this set.seed(). The third case is equal to 0.56. I have no idea why this is happening: when I run a simple test example, I get the correct answer:
testvector <- c(0.57,0.57,0.57,0.56,0.56,0.55,0.55)
count(testvector>=0.56)
returns 5 as expected.
Any help is much appreciated!