Plotting weighted distribution

63 Views Asked by At

Let's say I have the following data :

c1 <- runif(100, 0,1)
c2 <- runif(100, 0,1)
weights <- runif(100, 1,50)
categorie <- rbinom(100,1,0.5)

df <- as.data.frame(cbind(c1,c2,weights,categorie))

I would like to represent in the same plot the two distributions of c1 given that categorie=0 or given that categorie=1, and I would like to weight each observation by the variable weights. Futhermore, in y axis, I would like to have weighted proportions and not weighted numbers.

I would like to do a graph like this : enter image description here

How can I do that with ggplot2?

Thanks a lot!

1

There are 1 best solutions below

7
Allan Cameron On

From the comments, it appears you are looking for two groups according to categorie. Then you want a histogram which shows the weighted percentage of each bin as it applies to each group. I would probably pre-calculate this and draw it with geom_col:

df %>%
  reframe(x = seq(0.025, 0.975, 0.05), y = table(cut(c1, seq(0, 1, 0.05))),
          weight = tapply(weights, cut(c1, seq(0, 1, 0.05)), sum), 
          .by = categorie) %>%
  mutate(y = ifelse(is.na(y * weight), 0, y * weight)) %>%
  mutate(y = y / sum(y), .by = categorie) %>%
  ggplot(aes(x, y, fill = factor(categorie))) +
  geom_col(position = 'identity', alpha = 0.5) +
  scale_y_continuous('Percent', labels = scales::percent) +
  scale_x_continuous('c1', breaks = seq(0, 1, 0.1)) +
  scale_fill_manual('Category', values = c('orangered', 'deepskyblue4')) +
  theme_bw(base_size = 16)

enter image description here