Consider a dataset "trees" which has several factors and two numerical continuous variables. Two of these variables, plot.type (with classes "one", "ten", "non") and demog (with classes "seed", "sap", "adult"), are used to subset the data. For every subset, Kruskal-Wallis tests should be run on the two continuous variables: count & prop, using the grouping variable guild (with classes "fast", "med", "slow").
Without individually writing out the test on each subset and subset-of-subset of the data, is there a quick way in R to accomplish this task and put calculated p values to a matrix?
Example data
library(dplyr)
set.seed(123)
Data <- tbl_df(
data.frame(
plot.type = as.factor(rep(c("one", "ten", "non"), each = 80)),
guild.type = as.factor(sample(c("fast", "med", "slow"), replace = TRUE, prob = c(.12, .28, .60))),
demog = as.factor(sample(c("seed", "sap", "adult"),, prob = c(.33, .33, .33))),
prop = runif(240, 0, 10),
count = runif(240, 0, 10))
)%>%
group_by(plot.type)
trees <- as.data.frame(Data)
Was looking at other similar questions:
- R: Kruskal-Wallis test in loop over specified columns in data frame
- Kruskal - Wallis p-value matrix for data subsets with R
Tried this:
library(dplyr)
library(tidyr)
nm1 <- names(lhs)[4:5]
f1 <- function(x,y) kruskal.test(x~y)$p.value
tree %>%
do({data.frame(Map(f1, .[nm1], list(.$compare_by)))}) %>%
unite(Data_subsets, demog, plot.type, sep="_")
But missing how to include the guild infos as the grouping variable for the KW tests. Perhaps I should try to transform guild to a wide format table here?
Example of desired output:
Data_subsets fast.stat fast.p med.stat med.p slow.stat slow.p
1 seed_one <value> <value> <value> <value> <value> <value>
2 seed_ten <value> <value> <value> <value> <value> <value>
3 seed_non <value> <value> <value> <value> <value> <value>
4 sap_one <value> <value> <value> <value> <value> <value>
5 sap_ten <value> <value> <value> <value> <value> <value>
6 sap_non <value> <value> <value> <value> <value> <value>
7 adult_one <value> <value> <value> <value> <value> <value>
8 adult_ten <value> <value> <value> <value> <value> <value>
9 adult_non <value> <value> <value> <value> <value> <value>
Any suggestions? Thanks in advance!
If I understand you correctly, you can just unite all three factors together, run the Kruskal-Wallis test on all the resulting group, then spread your table wide to have the fast, med, slow separated by columns.
This would result in something like this:
The NAs are just there because the data does not have that comparison as is. I would choose to keep the table without pivoting_wider. Something like this:
results:
I hope this helps.
H