I have been struggling for a while to make my own CLD from a table of pairwise comparisons with p-values. I know it is possible with multcomp, but I would like to generate my own DIY function that can be adapted for different post-hoc outputs. Of course, there are two challenging aspects: the logic behind the groups generation and the programming implementation.
My "logic behind the groups generation" is this:
- Order treatments by mean.
- Assign treatment with highest mean to group 'a'
- From there, loop along all treatments. Compare each treatment i with the treatment in all already existing groups.
- If treatment i is not different from all the treatments of an already existing group, assign it to that group.
- If treatment i is different from at least one element from all the existing groups, create a new group where the treatment i and all the treatments not significantly different from i are assigned.
I have used loops because I find it easier to see what I am doing.
It would be great if somebody can point out any issue in the logic or in the implementation. If anybody can find any error or offer any hint on how to make it work, it would be greatly appreciated.
I upload the data separately, the mean values for each treatment, and the pairwise comparisons (generated with agricolae::HSD.test) between all the groups with the p-value (the pairwise comparisons file includes duplicated data, as there is all possible combinations in both columns, meaning all treatments are presented in both columns)
#LOADING THE DATA
contrasts <- read.table("https://raw.githubusercontent.com/paracon/cld_data/main/contrasts.csv",
header=TRUE)
means <- read.table("https://raw.githubusercontent.com/paracon/cld_data/main/means.csv",
header=TRUE)
#################################################################################
# WE ORDER TREATMENTS BY MEAN VALUE
means <- means[rev(order(means$mean)),]
# WE CREATE A DATAFRAME WHERE ALL GROUPS WILL BE ADDED AND
# WHERE THE TREATMENT WITH HIGHEST MEAN IS ASSIGNED GROUP "a"
existing_groups <- rbind(data.frame(trts=character(),
groups=character()),
data.frame(trts=as.character(means$treat[1]),
groups=as.character(letters[1]))
)
# WE LOOP ALONG ALL TREATMENTS (FROM means FILE) AFTER THE ONE WITH HIGHEST MEAN
for (i in 2:length(means$treat)){
# WE SUBSET FOR ALL THE CONTRASTS FOR TREATMENT i
contrasts_i <- contrasts[as.character(contrasts$col1)==as.character(gsub(" ", "", means[i,]$treat, fixed = TRUE)),]
# WE CREATE AN EMPTY DATAFRAME WHERE WE WILL ADD ALL TREATMENTS NOT DIFFERENT FROM TREATMENT i
same_as_checked <- data.frame(trts=as.character(),
groups=as.character())
# WE LOOP ALONG ALL ALREADY EXISTING GROUPS
for (g in unique(existing_groups$groups)){
# WE SUBSET FOR ALL THE TREATMENTS IN GROUP g
existing_groups_g <- existing_groups[existing_groups$groups==g,]
# WE LOOP ALONG ALL THE TREATMENTS IN GROUP g
for (j in 1:length(existing_groups_g$trts)){
existing_groups_j <- existing_groups_g[j,]
existing_groups_j$trts <- as.character(gsub(" ", "", existing_groups_j$trts, fixed = TRUE))
# WE CHECK PAIRWISE COMPARISON BETWEEN TREATMENT j IN THE GROUP AND contrasts_i$col2
# AND ALL ELEMENTS OF THAT GROUP
try(if(contrasts_i[contrasts_i$col2==existing_groups_j$trts,]$p_val>=0.05){
same_as_checked <- rbind(same_as_checked,
data.frame(trts=as.character(existing_groups_j$trts),
groups=NA))
},silent=TRUE)
}
}
print(means[i,]$treat)
print(same_as_checked$trts)
# same_as_checked SHOULD INCLUDE ALL THE TREATMENTS WHICH ARE NOT DIFFERENT FROM TREATMENT i
group_with_no_differences_exists <- "no"
# WE LOOP AGAIN ALONG ALL ALREADY EXISTING GROUPS
# NOW TO COMPARE THE TREATMENTS IN same_as_checked WITH ALL EXISTING GROUPS
for (g in unique(existing_groups$groups)){
# WE SUBSET FOR ALL THE TREATMENTS IN GROUP g
existing_groups_g <- existing_groups[existing_groups$groups==g,]
# WE CHECK IF GROUP g IS IDENTICAL TO same_as_checked
try(group_with_no_differences_exists <- ifelse(isTRUE(all.equal(unique(same_as_checked[order(same_as_checked$trts),]$trts),
unique(existing_groups_g[order(existing_groups_g$groups),]$trts))),
"yes","no"), silent=TRUE)
# IF GROUP IS IDENTICAL, WE WILL ADD TREATMENT i TO THIS GROUP
if (group_with_no_differences_exists=="yes"){
new_groups <- data.frame(trts=as.character(unique(contrasts_i$col1)),
groups=as.character(letters[which(letters==existing_groups_j$groups)])
)
}
# IF GROUP IS DIFFERENT, WE CREATE A NEW GROUP, WITH TREATMENT i AND ALL THE TREATMENTS IN same_as_checked
#same_as_checked IS NOT EMPTY:
if (group_with_no_differences_exists=="no" & nrow(same_as_checked)!=0){
new_groups <- rbind(data.frame(trts=same_as_checked$trts,
groups=rep(x=as.character(letters[which(letters==existing_groups_j$groups) + 1]),
times=length(same_as_checked$groups))
),
data.frame(trts=as.character(unique(contrasts_i$col1)),
groups=as.character(letters[which(letters==existing_groups_j$groups) + 1])
)
)
}
#same_as_checked IS EMPTY, TREATMENT i 'IS ALONE' IN THE NEW GROUP:
if (group_with_no_differences_exists=="no" & nrow(same_as_checked)==0){
new_groups <- data.frame(trts=as.character(unique(contrasts_i$col1)),
groups=as.character(letters[which(letters==existing_groups_j$groups) + 1])
)
}
}
existing_groups <- rbind(existing_groups, new_groups)
}
#################################################################################
library(tidyverse)
unique(existing_groups) %>%
group_by(trts) %>%
dplyr::summarise(
groups = paste(as.character(groups), collapse="")
)
This works but generates redundant groups, which have to be removed. To remove these groups, I apply the not very beautiful code:
#COMPARING EACH GROUP WITH THE NEXT GROUP (BY ALPHABETICAL ORDER),
#IF ALL TREATMENTS IN A GROUP ARE INCLUDED IN THE NEXT GROUP, THE FORMER IS REMOVED (CONVERTED TO NA)
for (g in unique(as.character(existing_groups$groups)))try({
existing_trts_g <- as.character(unique(existing_groups[existing_groups$groups==g,]$trts))
existing_trts_g_1 <- as.character(unique(existing_groups[existing_groups$groups==as.character(letters[which(letters==g) + 1]),]$trts))
if (existing_trts_g_1[!(is.na(existing_trts_g_1))] %contain% existing_trts_g[!(is.na(existing_trts_g))]){
existing_groups[!(is.na(existing_groups$groups)) & existing_groups$groups==g,]$groups <- NA
}
existing_groups <- existing_groups[!(is.na(existing_groups$groups)),]
},silent=TRUE)
for (g in unique(as.character(existing_groups$groups))){
nth <- match(g,unique(as.character(existing_groups$groups)))
existing_groups[existing_groups$groups==g,]$groups <- letters[nth]
}
The expected groups (according agricolae::HSD.test) are:
a: B100; ab: KT50, T10; abc: T35, KT10; bc: T50, T100, KT35; c: KT100
You can see them here too: https://raw.githubusercontent.com/paracon/cld_data/main/output.csv
Maybe this helps.
Find more details here.
Created on 2021-06-03 by the reprex package (v2.0.0)