Alluvial plot in R: how to space the strata?

1.8k Views Asked by At

Background

I have been working on creating an alluvial plot (kind of Sankey diagram) using ggplot and the ggalluvial package to visualize frequency differences over time and their origins.

As example, I have created a simple dataset of 100 imaginary patients that are screened for COVID-19. At baseline, all patients are negative for COVID-19. After let’s say 1 week, all patients are tested again: now, 30 patients are positive, 65 are negative and 5 have an inconclusive result. Yet another week later, the 30 positive patients remain positive, 10 patients go from negative to positive, and the others are negative.

data <- data.frame(analysis = as.factor(rep(c("time0", "time1", "time2"), each = 4)), 
                   freq = rep(c(30, 10, 55, 5), 3), 
                   track = rep(1:4, 3),  
                   response = c("neg","neg","neg","neg", "pos", "neg", "neg", "inconc", "pos", "pos", "neg", "neg"))

#   analysis freq track response
#1     time0   30     1      neg
#2     time0   10     2      neg
#3     time0   55     3      neg
#4     time0    5     4      neg
#5     time1   30     1      pos
#6     time1   10     2      neg
#7     time1   55     3      neg
#8     time1    5     4   inconc
#9     time2   30     1      pos
#10    time2   10     2      pos
#11    time2   55     3      neg
#12    time2    5     4      neg

Goal

The goal is to create an alluvial plot to visualize the ‘tracks’ (i.e., alluvia) of these patients over time and, thereby, visualize the origin of the results after two weeks. Something like:

enter image description here

Attempt

I managed to make the major part of the figure:

library(tidyverse)
library(ggalluvial)

ggplot(data, aes(x = analysis, stratum = response, alluvium = track, y = freq, fill = response), col = "black") +
    geom_flow(stat = "alluvium") +
    geom_stratum(alpha = .5) +
    scale_fill_manual(values = c("grey", "green", "red"))

enter image description here

Question

However, I am not able to distinguish the strata from one another clearly. Now, they are all adjacent to one another, which leads to a completely 'filled' rectangle.

How do you space the strata/alluvia in an alluvial plot using the ggalluvial package in R?

2

There are 2 best solutions below

0
On BEST ANSWER

The author of the ggalluvial package defines alluvial plots as:

Alluvial plots are parallel sets plots in which classes are ordered consistently across dimensions and stacked without gaps at each dimension.

You probably want to do a sankey plot, a reasonable package is: ggsankey

0
On

With ggalluvial you could do this:

The thing with alluvial plots is that you do not get separation between the "lodes" on the stratum.


library(ggplot2)
library(ggalluvial)

data$track <- factor(data$track)


ggplot(data, aes(x = analysis, y = freq, stratum = response, alluvium = track)) +
  geom_flow(aes(fill = track), stat = "alluvium") +
  geom_lode(aes(fill = response))+
  geom_text(stat = "stratum", aes(label = response)) +
  scale_fill_manual("Track",
                    breaks = c("1", "2", "3", "4", "neg", "pos", "inconc" ),
                    labels = c("1", "2", "3", "4", "", "", ""),
                    values = c("grey15", "grey40", "grey65", "grey90", "red", "green", "orange"))+
  guides(fill = guide_legend(override.aes = list(alpha = c(`1` = 1, `2` = 1, `3` = 1, `4` = 1,
                                                          neg = 0, pos = 0, inconc = 0))))+
  theme_minimal()

Created on 2021-04-18 by the reprex package (v2.0.0)