Is there a way for adding labels with number of observations per value in violin plot in ggplot?

2.3k Views Asked by At

Image you want to create a violin plot and have data like this:

set.seed(123)
Bodytype <- sample(LETTERS[1:3], 500, replace = T)
Weight <- rnorm(500,40,1)    
df <- data.frame(Bodytype, Weight)
  ggplot(data = df, 
         aes(x = Bodytype, y = Weight, fill = Bodytype)) +
  geom_violin(scale = "count", trim = F, adjust = 0.75) +
  scale_y_continuous(breaks = seq(34, 46, 1)) +
  theme_gray() 

Now I would like to add text label or something for each bodytype at each kg level to see how many observations in each bodytype category that weigh 36 kg, 37kg, etc. Is there are way for achieving this, or am I better off using another plot?

2

There are 2 best solutions below

2
On BEST ANSWER

This can be done in many ways, here is one:

library(dplyr)
library(ggplot2)
summ <- df %>%
  group_by(Bodytype) %>%
  summarize(n = n(), Weight = mean(Weight))
ggplot(data = df, aes(x = Bodytype, y = Weight, fill = Bodytype)) +
  geom_violin(scale = "count", trim = F, adjust = 0.75) +
  scale_y_continuous(breaks = seq(34, 46, 1)) +
  theme_gray() +
  geom_text(aes(label = n), data = summ)

ggplot2 with count labels on the means


Okay, so you want multiple weight counts:

weightcounts <- df %>%
  mutate(Weight = as.integer(round(Weight, 0))) %>%
  group_by(Bodytype, Weight) %>%
  count()
ggplot(data = df, aes(x = Bodytype, y = Weight, fill = Bodytype)) +
  geom_violin(scale = "count", trim = F, adjust = 0.75) +
  scale_y_continuous(breaks = seq(34, 46, 1)) +
  theme_gray() +
  geom_text(aes(label = n), data = weightcounts)

second ggplot2, with per-weight-counts

Either way, the premise is that you can generate a summary frame with the associated labels you need, and then add geom_text (or geom_label) with the new dataset as an argument.

0
On

Another way computing labels outside in base R:

set.seed(123)
Bodytype <- sample(LETTERS[1:3], 500, replace = T)
Weight <- rnorm(500,40,1)    
df <- data.frame(Bodytype, Weight)
#Labels
df$i <- 1
labs <- aggregate(i~Bodytype,df,sum)
labs$Weight<-NA
#Plot
ggplot(data = df, 
       aes(x = Bodytype, y = Weight, fill = Bodytype)) +
  geom_violin(scale = "count", trim = F, adjust = 0.75) +
  geom_text(data=labs,aes(x=Bodytype,y=45,label=i))
  scale_y_continuous(breaks = seq(34, 46, 1)) +
  theme_gray() 

Output:

enter image description here