I am trying to conduct several Mann Whitney U's which compare the impacts of population on offspring sex ratio skews. I'm using R studio. The dataset looks like:
data <- data.frame(
DamID = 1:50,
FemaleOffspring = sample(1:10, 50, replace = TRUE),
MaleOffspring = sample(1:10, 50, replace = TRUE),
SexRatio = runif(50, min = 0, max = 1),
BirthPop = sample(c('A', 'B'), 50, replace = TRUE),
Species = sample(c('R','X', 'Y', 'Z'), 50, replace = TRUE)
)
I've written the following line of code
library(dplyr)
sumstats <- data %>%
group_by(Species, BirthPop) %>%
summarize(median=median(SexRatio),
IQR=IQR(SexRatio),
Min=min(SexRatio),
Max=max(SexRatio),
n=n(),
wilcox_p = wilcox.test(SexRatio ~ factor(BirthPop), data = ., alternative = "two.sided")$p.value
Which gives me one p value for the entire dataset when I need a different p value for each species. Not sure what to do about this. Thanks in advance!
Two problems:
Use
cur_data(). When you use., the call towilcox.test()see all of the data, and it does not honor the grouping thatgroup_byhas imposed.When you group by
BirthPop, then each call towilcox.testgets only"A"or only"B", but it needs to see both to be able to perform the test.I suggest do two levels of stats, first on both
SpeciesandBirthPop(to get the majority of your statistics), and then once on justSpeciesfor your Wilcox tests.We can easily bring these back together with a merge/join operation: