How do I know that wilcox_test (package rstatix) recognizes the correct column for each individual sample, when doing a paired test?
Here is an example:
install.packages("rstatix")
install.packages("datarium")
library(rstatix)
library(datarium)
data("mice2", package = "datarium")
mice2.long <- mice2 %>% gather(key = "group", value = "weight", before, after)
mice2.long %>% wilcox_test(weight ~ group, paired = T)
It seems the test works correctly, but I didn't specify the column "id" to represent the individual sample designation, and thus how did the test understand that this column identified the "pairs"?
The argument passed to paired is TRUE. the function will correlate the first value of before to the first value of after and so on . It does not need the column id. But if the data is not arranged such as the first value of before directly correlated to the first value of after, the function
wilcox_textwould give incorrect results.Here is a quick example:
Now if we randomize the long data, such that the 1st value of before does not correspond with the first value of after, we should get different results
Try again with a different seed and you get different results.
Note that this is not the same for your case, ie randomizing
mice2does not produce different results. Why? because all the values ofbeforeare smaller than all the values ofafter. ie the maximum of the values before is smaller than the minimum of the values after:This is very critical in computing the wilcox statistic in that regardless of the permutation, all the differences of
after - beforewill be positive and thus all the ranks will be grouped as positive thereby we just need tosum(1:10) = 55. This is the test statistic.