ggforce in R to create sankey diagram with NA values and geom_parallel_sets

224 Views Asked by At

My goal is to create a Sankey diagram that has each column of nodes parallel to each other and not by default nodes that aren't aligned with the next column(s). I posted asking for help but I have not received an answer: geom_sankey in R: spacing and aligning nodes.

Here is the output of my attempt using geom_sankeyand the issues with it: enter image description here

This post: Sankey diagram in R: How to change the height (Y) of individual sections related to each node? has me convinced that I am going about this the wrong way and that I should try the ggforce package.

The crux of the issue: I cannot figure out how to format the data so that ggplot's flag of split and the fill flag of geom_parallel_sets is satisfied with the data that I am using. Here is a made up example, but my data is of a similar 'flavor.'

Example

#Making df

Years <- data.frame(Earlier = c(rep(2012, 2), paste(2013), paste(2014), rep(2015, 2), rep(2018, 2), rep(2022, 2), rep(NA, 31)),
           Latest = c(rep(2023, 4), rep(2022, 6), rep(2021, 10), rep(2020, 3), rep(2019, 6), rep(2018, 3), rep(2017, 3), rep(2013, 4), rep(NA, 2)),
           Current = c(rep(2023, 10), rep(2022, 12), rep(2021, 11), rep(2020, 1), rep(NA, 7)))

#Shuffling

set.seed(123)
Years[sample(1:nrow(Years)), ]

#Changing all data.frame to numeric

ix <- 1:3
Years[ix] <- lapply(Years[ix], as.numeric)

#putting it in ggforce format

Years2 <- gather_set_data(Years, 1:3)

This gives the following output (1st 10 rows)

enter image description here

According to the posts (like the one I linked above) doing sankey's with ggforce, I need to fulfill the split and fill flags, but as you can see, splitting by column x will not give me the desired output. Additionally, I would like to fill by the years, with each year having it's unique color and I also would like the column names to appear on the graph, like the image above.

Here is the code I am using and I am putting ??? where I am stuck.

library(ggplot2); library(ggforce)

ggplot(Years2, 
       aes(x = x, id = id, split = ???, value = ???)) +
  geom_parallel_sets(aes(fill = ???), alpha = 0.3, 
                     axis.width = aw, sep = sp) +
  geom_parallel_sets_axes(axis.width = 0.1, sep = 0.1) +
  geom_parallel_sets_labels(colour = "white", 
                            angle = 0, size = 3,
                            axis.width = aw, sep = sp) +
  theme_minimal()

I have tried many, many things - some notable efforts include: adding another column called 'split' on the Years2 df and pasting a 1,2,3 for when the 'Earlier', 'Latest', and 'Current' numbers start turning to NA's; using the melt function from reshape2, and using the Years %>% make_long(Earlier, Latest, Current) command needed for the geom_sankey command.

Extra info: sessionInfo() R version 4.3.0 (2023-04-21) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.6

Any help navigating this quagmire would be greatly appreciated. Thank-you.

1

There are 1 best solutions below

6
On

Hopefully this is what you're looking for. According to the documentation, geom_parallel_sets require value to be provided as an aesthetic. I assume the value represent frequency of connections between nodes (or the thickness of links). You may get these counts using table and reshape2::melt()

library(ggplot2)
library(ggforce)

# data
Years <- data.frame(Earlier = c(rep(2012, 2), paste(2013), paste(2014), rep(2015, 2), rep(2018, 2), rep(2022, 2), rep(NA, 31)),
                    Latest = c(rep(2023, 4), rep(2022, 6), rep(2021, 10), rep(2020, 3), rep(2019, 6), rep(2018, 3), rep(2017, 3), rep(2013, 4), rep(NA, 2)),
                    Current = c(rep(2023, 10), rep(2022, 12), rep(2021, 11), rep(2020, 1), rep(NA, 7)))


# format data for sankey diagram
df <- table(Years) |> 
  reshape2::melt() |> 
  gather_set_data(1:3)

# plot
df |> 
  ggplot(aes(x = x, id = id, split = y, value = value)) +
  geom_parallel_sets(alpha = 0.3, axis.width = 0.1, sep = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1, sep = 0.1) +
  geom_parallel_sets_labels(color = "white", angle = 0,
                            axis.width = 0.1, sep = 0.1)

Created on 2023-11-10 with reprex v2.0.2