I have continuous gene expression data (log2TPM) on the X-axis and categorical data on the Y-axis (gene groupings). I I am using the following script and to get the plot below:
ggplot(merged_data, aes(x = log2TPM, y=categ)) +
ggdist::stat_slab(justification = -.2, width = .6) +
geom_boxplot(outlier.color = NA, width = .15) +
ggdist::stat_spike(at=gene_data$log2TPM, justification = -.2)
Is there a way to properly place a text label on top of the spike? I am envisioning two ways, one with the label in the upper margin of the plotting area, or another, amybe more complicated where the label tracks the half eye plot (stat_slab).
The goal is to get something like this, after adding the label contained in gene_data$gene_name:
Here is a sample of what my data looks like:
r$> merged_data %>% select(log2TPM,gene_name,categ) %>% head()
log2TPM gene_name categ
3.3575520 TSPAN6 All_13421
4.5084287 DPM1 All_13421
1.6507646 SCYL3 All_13421
0.9259994 C1orf112 All_13421
4.5637683 CFH All_13421
4.8619554 FUCA2 All_13421


The short answer is "it's complicated". The exact position of the endpoints of the spikes is determined by a combination of the scaling of the
thicknessaesthetic (determined by anythicknessscales added to the plot, as well as thenormalizeandscaleparameters tostat_slab), as well as thejustificationandpositionarguments to the slab.The first thing to do to make your life easier (and which you should always do when using
stat_spiketo label a slab), is to addscale_thickness_shared()to the plot. This will ensure thatstat_spike()andstat_slab()use the same scaling function, and will fix the fact that in your current plot the endpoints of the spike do not lie on the density curve.With shared scaling in place, in the default case (no other changes to
justification,position, orside), the endpoint of the spike will be theyposition plus thethicknesstimes0.9(which is the default value ofscale).You can use
stat_spike()withgeom = "label"orgeom = "text"and anafter_scale()calculation to determine the location. You have to convert thethicknessto a numeric because it is a subclass ofnumericthat otherwise can't be used directly as a position value. Here's a simple example with two groups:You have to do
as.numeric(thickness)instead of justthicknessbecause thickness values are a subclass ofnumericthat won't work directly as positional values. See Details inhelp("ggdist::thickness")for more on that if you're curious.To be honest, this all should be easier but that's the best approach I can suggest at the moment.