As an example, here a stacked density chart based on diamonds
. A simple table suggests that "Fair" cut should be quite uncommon, and "Ideal" the most common, etc..., but the five cuts seem to occupy a more or less equal area/ proportion on the stacked charts. That is surprising to me.
This is probably a conceptual misunderstanding from my side and I'd be grateful for someone to help. So, what does stacked density exactly show?
library(tidyverse)
library(patchwork)
p1 <- diamonds %>%
ggplot() +
geom_density(aes(x = price, fill = cut), position = "stack")
p2 <- diamonds %>%
ggplot() +
geom_density(aes(x = price, fill = cut), position = "fill")
p1 / p2 + plot_layout(guides = "collect")
table(diamonds$cut)
#>
#> Fair Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
Created on 2023-11-12 with reprex v2.0.2
The densities are being normalized (i.e. scaled to area 1) before being combined. If you don't want that, use
after_stat(count)
, e.g.Created on 2023-11-12 with reprex v2.0.2
Edited to add:
Here's how this works:
geom_density()
always computes the normalized density for each group. Settingy = after_stat(count)
tells it to multiply the density by the count for the group, so you get "number of cases per unit ofprice
" instead of "probability per unit ofprice
", the usual units of the density.If you'd like to keep the density scale, you can use
This will actually show "subdensities", where the total area is 1, and the area of each region is the proportion of cases of that class. (The expression
sum(tapply(n, group, unique))
is a fancy way to computenrow(diamonds)
without referring to the particular dataset. It says: find the unique values ofn
in each group, and add them up. Sincen
is the same for all entries in the group, that gives the overall total. Thanks to @Stefan for the trick.)