get significantly different groups from dunn test in R

1.7k Views Asked by At

In R, I compare groups with the dunn.test. Here is some example data, where "type" is the grouping variable:

my_table <- data.frame ("type" = c (rep ("low", 5), rep ("mid", 5), rep ("high", 5)),
                        "var_A" = rnorm (15),
                        "var_B" = c (rnorm (5), rnorm (5, 4, 0.1), rnorm (5, 12, 2)) 
                        )

I want to compare the variables var_A and var_B among the three groups with the dunn.test (), which puts out the following results:

library (dunn.test)
dunn.test (my_table$var_A, my_table$type)
>  Kruskal-Wallis rank sum test
>
> data: x and group
> Kruskal-Wallis chi-squared = 6.08, df = 2, p-value = 0.05
>
>
> Comparison of x by group                            
> (No adjustment)                                
> Col Mean-|
> Row Mean |       high        low
> ---------+----------------------
>      low |   0.919238
>          |     0.1790
>          |
>      mid |   0.989949   0.070710
>          |     0.1611     0.4718
>
> alpha = 0.05
> Reject Ho if p <= alpha/2

and

dunn.test (my_table$var_B, my_table$type)
> Kruskal-Wallis rank sum test
>
> data: x and group
> Kruskal-Wallis chi-squared = 12.5, df = 2, p-value = 0
>
>
> Comparison of x by group                            
> (No adjustment)                                
> Col Mean-|
> Row Mean |       high        low
> ---------+----------------------
>      low |   3.535533
>          |    0.0002*
>          |
>      mid |   1.767766  -1.767766
>          |     0.0385     0.0385
>
> alpha = 0.05
> Reject Ho if p <= alpha/2

I understand that for var_A, I cannot see any significant differences between the three groups. For var_B, the groups "low" and "high" differ significantly. When presenting the results, I could choose a table like

library (tidyverse)
data.frame ("low" = my_table %>%
                filter (type == "low") %>%
                select (c ("var_A", "var_B")) %>%
                sapply (mean) %>%
                round (digits = 2),
            "mid" = my_table %>%
                filter (type == "mid") %>%
                select (c ("var_A", "var_B")) %>%
                sapply (mean) %>%
                round (digits = 2),
            "high" = my_table %>%
                filter (type == "high") %>%
                select (c ("var_A", "var_B")) %>%
                sapply (mean) %>%
                round (digits = 2 )
                )


>             low    mid   high
> var_A      0.14  -0.10   0.74
> var_B     -0.41   3.97  11.44

What I'd like to achieve is to add characters in order to indicate the results of the dunn.test. This could look something like

>               low         mid         high 
> var_A     0.14  a    -0.10  a      0.74  a
> var_B    -0.41  a     3.97 ab     11.44  b

So, my long but short question is: how can I tell the dunn.test function to put out the grouping-characters (eg. "a", "ab" or "b"). Or is there a workaround to get the desired charaters?

1

There are 1 best solutions below

0
Cam On

Maybe the kruskal() function in the agricolae package might get what you're looking for. Among the output is 'groups' which contain letters corresponding to group. Package details say that post-hoc is done using Fishers LSD though, not Dunn test. But can include p.adj argument for multiple comparisons adjustments

library(tidyverse)
library(agricolae)
library(reshape2)

my_table <- data.frame ("type" = c (rep ("low", 5), rep ("mid", 5), rep ("high", 5)),
                        "var_A" = rnorm (15),
                        "var_B" = c (rnorm (5), rnorm (5, 4, 0.1), rnorm (5, 12, 2)) 
)

# melt in order to use lapply 
my_MeltedTable = melt(my_table, id.vars='type')

# apply kruskal(value,type) across two levels of variable (var_A and var_B)
results = lapply(split(my_MeltedTable[,c("type", "value")], my_MeltedTable$variable), 
       function(x) kruskal(x$value, x$type, p.adj="bon"))

# the grouping information you'd like will be found in
results$var_A$group
results$var_B$group

Probably a way to pull out the things you need from within the lapply() but I don't know how, so here is how I got the table required:

# create empty df for results
resTable <- data.frame(matrix(ncol = 6, nrow = 2))

# results$means contains means of variable per group
# assign col names from row names in results
colnames(resTable) = row.names(results$var_A$means)

# pull out means for var_A & round to 2 digits & transpose as are rows
resTable[1,1:3] = round(digits = 2, t(results$var_A$means[,1])) 
# pull out means for var_B & round to 2 digits & transpose 
resTable[2,1:3] = round(digits = 2, t(results$var_B$means[,1])) 

# results$group contains letters denoting  of variable per group
resTable[1,4:6] = t(results$var_A$group[,2]) # pull out stat grouping for varA
resTable[2,4:6] = t(results$var_B$group[,2]) # pull out stat grouping for varB

resTable = resTable[,c(2,5,3,6,1,4)] # re-order cols
rownames(resTable) = c("var_A", "var_B") # name rows
colnames(resTable) = c("low", " ","med", " ", "high","") # name cols

And after all that long-windedness!

        low    med    high  
var_A  0.12 a 0.40 a -0.76 a
var_B -0.45 b 3.99 c 11.46 a