I'm performing the Wilcoxon test with R in a large List (containing 86 data frames with a variable number of rows and values). I don't understand why after p-values < 10^-307 it returns some p-values = 0 and then it returns with "normal" values. This condition occurs with data frames with a very high number of rows (about 4000 or more), but not in data frames with about 1000 rows. Am I getting the script wrong or is it possible to get 0 values? If so, is there a particular way to interpret them?
Here I report my script:
for (i in 1 : length(List)) {
for(j in 2 : (nrow(List[[i]]) - 1)){
divider <- (List[[i]][j,2])
ValueInfe <- List[[i]][List[[i]][,2] < divider ,]
ValueSupUgu <- List[[i]][List[[i]][,2] >= divider ,]
if(j == 2){Num_ValoreInfe <- as.numeric(ValoreInfe[2])}
if(j!= 2){Num_ValoreInfe <- as.numeric(ValoreInfe[,2])}
Num_ValoreSupUgu <- as.numeric(ValoreSupUgu[,2])
b <- wilcox.test(Num_ValoreInfe, Num_ValoreSupUgu)
List2.0[[i]][j,3] <- b$p.value
}
}
Here is an example of my results:
0.000000e+00
8.343024e-02
1.435822e-02
2.716505e-03
5.370877e-04
1.089895e-04
2.250558e-05
4.706192e-06
9.936437e-07
2.114061e-07
4.526195e-08
9.741929e-09
2.106339e-09
4.572291e-10
9.960156e-11
9.960156e-11
[...]
0
0
0
0
0
[...]
2.114061e-07
9.936437e-07
4.706192e-06
2.250558e-05
1.089895e-04
5.370877e-04
2.716505e-03
1.435822e-02
0.000000e+00
Normally, the smallest number that R can distinguish from zero is about 1e-308 (i.e., 10^(-308)) - specifically,
.Machine$double.xmin
=2.225074e-308. More precisely, R can handle slightly smaller values:?Machine
says:If you want to deal with numbers smaller than that, you have to do something clever like keeping track of their logarithms instead (
log(.Machine$double.xmin)
is -708, you could easily keep track of much smaller numbers that way ). Some of the p-value computations in R allow you to retrieve the log-p-value rather than the p-value, but the Wilcoxon test doesn't have such a capability.While it might be possible to build such a capability from scratch if you needed it badly enough, researchers usually just treat such p-values as "extremely small";you could say "<1e-308" if you wanted. The only research field I've seen where people worry about the precise values of such tiny p-values is in bioinformatics, where the p-value is treated as a metric in its own right rather than a qualitative indicator of the clarity of a statistical difference.
Here's a small example that tests the p-value of non-overlapping sets with progressively larger sample sizes, showing the p-value decreasing and then underflowing to zero (see the points lying on the lower y-axis at the right edge of the plot):
hacking the log-p values
Digging into the code inside
stats:::wilcox.test.default
, we can find the place where the p-values are computed from the test statistic and the group-wise sample sizes and re-compute them withlog.p=TRUE
. The code below skips a few details like accounting for ties and allowing different alternative hypotheses (i.e. this is assuming a two-sided test).This gives you the natural log of the p-value; you can convert back to log10 by multiplying by
log10(exp(1))
...