This is my sample of a big data matrix & and each column has named with multiple information and separated by an underscore.
structure(list(Gene = c("AGI4120.1_UBQ", "AGI570.1_Acin"), WT_Tissue_0T_1 = c(0.886461437, 1.093164915), WT_Tissue_0T_2 = c(1.075140682, 1.229862834), WT_Tissue_0T_3 = c(0.632903012, 1.094003128), WT_Tissue_1T_1 = c(0.883151274, 1.26322126), WT_Tissue_1T_2 = c(1.005627276, 0.962729188), WT_Tissue_1T_3 = c(0.87123469, 0.968078993), WT_Tissue_3T_1 = c(0.723601456, 0.633890322), WT_Tissue_3T_2 = c(0.392585237, 0.534819363), WT_Tissue_3T_3 = c(0.640185369, 1.021934772), WT_Tissue_5T_1 = c(0.720291294, 0.589244505), WT_Tissue_5T_2 = c(0.362131744, 0.475251717), WT_Tissue_5T_3 = c(0.549486925, 0.618177919), mut1_Tissue_0T_1 = c(1.464415756, 1.130533457), mut1_Tissue_0T_2 = c(1.01489573, 1.114915728), mut1_Tissue_0T_3 = c(1.171797418, 1.399956009), mut1_Tissue_1T_1 = c(0.927507448, 1.231911575), mut1_Tissue_1T_2 = c(1.089705396, 1.256782289 ), mut1_Tissue_1T_3 = c(0.993048659, 0.999044465), mut1_Tissue_3T_1 = c(1.000993049, 1.103486794), mut1_Tissue_3T_2 = c(1.062562066, 0.883617224 ), mut1_Tissue_3T_3 = c(1.037404833, 0.851875438), mut1_Tissue_5T_1 = c(0.730883813, 0.437440083), mut1_Tissue_5T_2 = c(0.480635551, 0.298762126 ), mut1_Tissue_5T_3 = c(0.85468388, 0.614923997)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Gene = structure(list(), class = c("collector_character", "collector")), WT_Tissue_0T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_0T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_0T_3 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_1T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_1T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_1T_3 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_3T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_3T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_3T_3 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_5T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_5T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_5T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_0T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_0T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_0T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_1T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_1T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_1T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_3T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_3T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_3T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_5T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_5T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_5T_3 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
I want to follow a Tukey Test and plot bar charts for each Gene (Response vs. Time; filled by both the genotypes) with multiple comparisons letters.
Syntax
df1 <- df %>%
gather(var, response, WT_Tissue_0T_1:mut1_Tissue_5T_3) %>%
separate(var, c("Genotype", "Tissue", "Time"), sep = "_") %>%
arrange(desc(Gene))
df2 <- df1 %>%
group_by(`Gene`,Genotype,Tissue,Time) %>%
mutate(Response=mean(response),n=n(),se=sd(response)/sqrt(n))
Two-way ANOVA
fit1 <- aov(Response ~ Genotype*Time, df2)
Hereafter, I want to go proceed Tukey Test (Multiple Comparison), for e.g. Gene "AGI4120.1_UBQ", plot Response vs. Time, and see how each genotype (WT & mut1) behave in each Time point (0T, 1T, 3T, and 5T)? if the response is significantly different or not and denote by letters in the plot.
As below lsmeans syntax combine all the Genes to one and give output, how can I make it to loop for each gene separately (i.e. "AGI4120.1_UBQ", "AGI570.1_Acin") and get letters to show statistically distinct groups (aka "compact letter display")
lsmeans(fit1, pairwise ~ Genotype | Time)
My final aim is to plot each gene like this below graph and denote significance letters.
df2$genotype <- factor(df2$genotype, levels = c("WT","mut1")) colours <- c('#336600','#ffcc00')library(ggplot2)ggplot(df2,aes( x=Time, y=Response, fill=Genotype))+geom_bar(stat='identity', position='dodge')+scale_fill_manual(values=colours)+geom_errorbar(aes(ymin=average_measure-se, ymax=average_measure+se)+facet_wrap(~`Gene`)+labs(x='Time', y='Response')
Expecting Graph for denoting compact letter display
I would appreciate your kind help, if possible.
There are a number of issues with your code. I'd go so far as to say it's not really an appropriate post for StackOverflow, as the problems here are diverse and not really extendable beyond this specific constellation of bugs and syntax issues. But I'll post a few suggestions as an answer to get you started - hope they help.
1.
lsmeans()
:This function expects a fitted model (like from
lm()
), or aref.grid
object. But you are passing it a data frame, without any of the regression properties it needs to compute least-squares means. (Think: How doeslsmeans()
know what the dependent variable should be, when you ask for pairwise comparison withGenes
as the independent variable?) Check out the Usinglsmeans
documentation for more details.Based on your data, I'd say you probably want to run a multi-level regression, using something like the
lme4
package, withGene
andGenotype
and maybeTime
as nested grouping levels.But for demonstration I'll keep it simple with
lm()
. Passing a fitted regression object tolsmeans()
works as intended:2.
ggplot()
:You haven't defined
colours
oraverage_measure
in the code you've provided; calling these undeclared variables will cause failure.Structurally, I would suggest you use
df1
and allowggplot
to do the grouping, rather thangroup_by
intodf2
. Then usestat="summary"
andfun.y="mean"
to accomplish thesummarise()
computations you did indf2
. Doing this also allows you to use themean_se
function for your error bars. Like this:Finally, note that your use of
separate()
indf1
will throw a warning, although not an error, as there is an extra value created by splitting onsep="_"
. You can avoid that (if it is causing confusion) by adding a level to capture the final value (which appears to be a time index):