Multiple years as axes in sankey/alluvial diagram with percentage

34 Views Asked by At

I have a data.frame df which has three columns named as id, year, class. id has the user ids, year has values {2018, 2019, 2020, 2021, 2022}. And class has three different values {class_A, class_B, class_C}. And the dataset has more than 50K rows.

I would like to track the flow of users (percentage, not absolute numbers) over the years from one class to another.

I am trying to follow different examples, particularly this one from here

library(ggplot2)
library(ggalluvial)
library(dplyr)

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))

vaccinations <- vaccinations %>% 
  group_by(survey) %>% 
  mutate(pct = freq / sum(freq))

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = pct,
           fill = response %in% c("Missing", "Never"), 
           label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  scale_y_continuous(label = scales::percent_format()) +
  scale_fill_manual(values = c(`TRUE` = "cadetblue1", `FALSE` = "grey50")) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1))), stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

But I don't know how to make years as axes (one for each) and stratums should be classes.

Any guidance please.

0

There are 0 best solutions below