Kruskal-Wallis statistic & p-value matrix loop for data subsets in R

Question

Kruskal-Wallis statistic & p-value matrix loop for data subsets in R

218 Views Asked by italia At 17 August 2025 at 15:46

Consider a dataset "trees" which has several factors and two numerical continuous variables. Two of these variables, plot.type (with classes "one", "ten", "non") and demog (with classes "seed", "sap", "adult"), are used to subset the data. For every subset, Kruskal-Wallis tests should be run on the two continuous variables: count & prop, using the grouping variable guild (with classes "fast", "med", "slow").

Without individually writing out the test on each subset and subset-of-subset of the data, is there a quick way in R to accomplish this task and put calculated p values to a matrix?

Example data

library(dplyr)
set.seed(123)
Data <- tbl_df(
  data.frame(
    plot.type = as.factor(rep(c("one", "ten", "non"), each = 80)),
    guild.type = as.factor(sample(c("fast", "med", "slow"), replace = TRUE, prob = c(.12, .28, .60))),
    demog = as.factor(sample(c("seed", "sap", "adult"),, prob = c(.33, .33, .33))),
    prop   = runif(240, 0, 10),
    count    = runif(240, 0, 10))
  )%>%
  group_by(plot.type)
trees <- as.data.frame(Data)

Was looking at other similar questions:

Tried this:

library(dplyr)
library(tidyr)
nm1 <- names(lhs)[4:5]
f1 <- function(x,y) kruskal.test(x~y)$p.value

tree %>% 
  do({data.frame(Map(f1, .[nm1], list(.$compare_by)))}) %>% 
  unite(Data_subsets, demog, plot.type, sep="_")

But missing how to include the guild infos as the grouping variable for the KW tests. Perhaps I should try to transform guild to a wide format table here?

Example of desired output:

Data_subsets fast.stat  fast.p  med.stat  med.p  slow.stat slow.p          
1   seed_one   <value>  <value> <value>  <value>  <value> <value>
2   seed_ten   <value>  <value> <value>  <value>  <value> <value>
3   seed_non   <value>  <value> <value>  <value>  <value> <value>
4   sap_one    <value>  <value> <value>  <value>  <value> <value>
5   sap_ten    <value>  <value> <value>  <value>  <value> <value>
6   sap_non    <value>  <value> <value>  <value>  <value> <value>
7   adult_one   <value>  <value> <value>  <value>  <value> <value>
8   adult_ten   <value>  <value> <value>  <value>  <value> <value>
9   adult_non   <value>  <value> <value>  <value>  <value> <value>

Any suggestions? Thanks in advance!

Original Q&A

There are 1 best solutions below

**Hani Alsafadi** · Accepted Answer

If I understand you correctly, you can just unite all three factors together, run the Kruskal-Wallis test on all the resulting group, then spread your table wide to have the fast, med, slow separated by columns.


library(dplyr)
set.seed(12)
Data <- tbl_df(
  data.frame(
    plot.type = as.factor(rep(c("one", "ten", "non"), each = 80)),
    guild.type = as.factor(sample(c("fast", "med", "slow"), replace = TRUE, prob = c(.12, .28, .60))),
    demog = as.factor(sample(c("seed", "sap", "adult"),, prob = c(.33, .33, .33))),
    prop   = runif(240, 0, 10),
    count    = runif(240, 0, 10))
  )%>%
  group_by(plot.type)
trees <- as.data.frame(Data)

library(dplyr)
library(tidyr)

trees %>% 
  unite(Data_subsets, demog, plot.type, guild.type, sep="_", remove = F) %>% 
  group_by(Data_subsets) %>% 
  summarise(stats=kruskal.test(count~prop)$stat, p = kruskal.test(count~prop)$p.value) %>% 
  separate(Data_subsets, c("demog", "plot.type", "guild.type")) %>% 
  unite(Subsets, demog, plot.type, sep="_") %>% 
  pivot_wider(id_cols = Subsets, 
            names_from = guild.type, 
            values_from = c("stats", "p"))

This would result in something like this:

 A tibble: 9 x 7
  Subsets   stats_med stats_slow stats_fast  p_med p_slow p_fast
  <chr>         <dbl>      <dbl>      <dbl>  <dbl>  <dbl>  <dbl>
1 adult_non        26         NA         NA  0.463 NA     NA    
2 adult_one        26         NA         NA  0.463 NA     NA    
3 adult_ten        25         NA         NA  0.462 NA     NA    
4 sap_non          NA         25         NA NA      0.462 NA    
5 sap_one          NA         26         NA NA      0.463 NA    
6 sap_ten          NA         26         NA NA      0.463 NA    
7 seed_non         NA         NA         26 NA     NA      0.463
8 seed_one         NA         NA         25 NA     NA      0.462
9 seed_ten         NA         NA         26 NA     NA      0.463

The NAs are just there because the data does not have that comparison as is. I would choose to keep the table without pivoting_wider. Something like this:

trees %>% 
  unite(Data_subsets, demog, plot.type, guild.type, sep="_", remove = F) %>% 
  group_by(Data_subsets) %>% 
  summarise(stats=kruskal.test(count~prop)$stat, p = kruskal.test(count~prop)$p.value) %>% 
  separate(Data_subsets, c("demog", "plot.type", "guild.type"))

results:

# A tibble: 9 x 5
  demog plot.type guild.type stats        p
  <chr> <chr>     <chr>      <dbl>    <dbl>
1 adult non       med          266 4.38e-22
2 adult one       med          266 4.38e-22
3 adult ten       med          265 6.25e-22
4 sap   non       slow         265 6.25e-22
5 sap   one       slow         266 4.38e-22
6 sap   ten       slow         266 4.38e-22
7 seed  non       fast         266 4.38e-22
8 seed  one       fast         265 6.25e-22
9 seed  ten       fast         266 4.38e-22

I hope this helps.

H

Kruskal-Wallis statistic & p-value matrix loop for data subsets in R

There are 1 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in TIDYR

Related Questions in P-VALUE

Related Questions in KRUSKAL-WALLIS

Trending Questions

Popular # Hahtags

Popular Questions