I am trying to visualize my statistical measures of ANOVA and post-hoc Tukey in a barplot. This worked out so far, but my letters are in the wrong order, the bar "mit Downsampling" (2) should be the one differnet to the others and not the first one "ohne Sampling.
The code I am using:
library(ggplot2)
library(multcompView)
library(reshape2)
a.lda <- read.table(header = TRUE, text = " rowname ohne_Sampling mit_Downsampling mit_Upsampling Gewichtung
1 Fold1 0.6732673 0.8390805 0.7192982 0.6732673
2 Fold2 0.7227723 0.8181818 0.7105263 0.7227723
3 Fold3 0.7100000 0.7586207 0.6842105 0.7100000
4 Fold4 0.6633663 0.8295455 0.7105263 0.6633663
5 Fold5 0.7128713 0.8750000 0.7017544 0.7128713")
#Transformation of the dataframe to get a format ggplot2 is able to use
a.lda <- melt(a.lda, id.vars="rowname")
#data_summary Function
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE),
minimum = min(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
a.sd.lda <- data_summary(a.lda, varname = "value", groupnames = "variable")
#ANOVA+Tuckey
a.anova <- aov(data=a.lda, value ~ variable)
tukey <- TukeyHSD(a.anova)
cld <- as.data.frame.list((multcompLetters4(a.anova,tukey))$variable)
#The wrong letters do already appear here
a.sd.lda$cld <- cld$Letters
So by checking the a.sd.lda
table one can already see the wrong letters as a,b,b,b instead of a,b,a,a. Also by checking the tukey results, there is NO significant difference between ohne Sampling, mit Upsampling and Gewichtung. So I guess the multcompLetters4()
function is causing the misorder.
I would be so thankful for any suggestions!!!
Searching for an answer I found this stackoverflow entry (Wrong Tukey-letter ordering in R multcompView package) but none of the answers did solve my problem.
Just to round things up, this is the code for the visualisation, although the mistake in my code has to be above
#Visualization
ldaplot <- ggplot(a.sd.lda, aes(variable,value,fill=variable))+
labs(title="LDA")+
scale_x_discrete(guide = guide_axis(n.dodge=2))+
coord_cartesian(ylim=c(y_min,1))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
scale_fill_brewer(palette="YlOrBr")+
geom_text(data = a.sd.lda, aes(x = variable, y = value, label = cld), size = 5, vjust=-.5, hjust=-.7)+
geom_errorbar(aes(ymin=value-sd, ymax=value+sd), width=.2,
position=position_dodge(.9))+
labs(x="", y="Accuracy")+
geom_abline(aes(intercept=Akzeptanzwert,slope=0), color="red")
The source code of multcompLetter4() function and related ones is available here: https://rdrr.io/cran/multcompView/f/
Hmm I see you have already solved your own problem, but here is my solution anyway:
I would argue that this is all the code needed for getting descriptive statistics and the compact letter display from the Tukey's test.
Note that for the plots I am only using the raw data
a.lda
and the result outputemmcld
:reproducing your plot
alternative plot suggestion
..because some people don't like "dynamite plots" as nicely described in this blogpost
Created on 2022-01-27 by the reprex package (v2.0.1)