I noticed strange behaviour of the raincloud plot package in R. Specifically, density curves are sensitive to data values in some (but not all) cases. It seems that the shape of the distribution curve made by geom_flat_violin() is somehow linked to data values (where it shouldn't be), and I can't find how to restore their independence. The only clue I managed to find: the curves are shrunk based on the lowest values in the data, although shrinkage affects the whole panel where those values occur, not just the sub-group containing them.
Below is a reproducible example, and a link to its image output to show what I mean. Just a note in advance: the raincloud package (presented in This paper) is not on CRAN afaik, so I lifted it directly from the authors' github repo. I also tried an alternative source file which reproduces the . Other implementations such as ggrdiges::geom_density_ridges() or {ggdist} didn't have the same level of control on graphic parameters (e.g. smoothing), unless I'm missing something.
Example code:
library(reshape2)
library(ggplot2)
source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R")
# load data and melt into longform
data(iris)
miris <- melt(iris,id.vars = "Species", measure.vars = colnames(iris)[1:4], variable.name = "measurement")
## 1- plotting as is gives horizontally "squashed" curves in two of four panels
ggplot(miris, aes(x = Species, y = value, fill = Species)) +
geom_flat_violin(position = position_nudge(x = .15, y = 0)) +
facet_wrap(~measurement)
## 2- manipulating the group of smallest values seems to fix the relevant panel (but fixing other groups doesn't fix the problem - I tried that)
airis <- miris
# get indices of data to manipulate
inds <- intersect(which(airis$Species == "setosa"), which(airis$measurement == "Petal.Width"))
# assign larger values
airis$value[inds] <- rnorm(length(inds), 3, 0.5)
ggplot(airis, aes(x = Species, y = value, fill = Species)) +
geom_flat_violin(position = position_nudge(x = .15, y = 0)) +
facet_wrap(~measurement)
## this second plot shows larger distribution curves for all speceis in the "Petal.Width" panel, although values were only changed for "setosa"
Does anyone know where the problem might be, or what can be done to fix it?
Many thanks!