I am trying to visualize my data via a sankey diagram.
I have the following dataframe:
sankey1 <- structure(list(pat_id = c(10037, 10264, 10302, 10302, 10302,
10344, 10482, 10482, 10482, 10613, 10613, 10613, 10628, 10851,
11052, 11203, 11214, 11214, 11566, 11684, 11821, 11945, 11945,
11952, 11952, 12122, 12183, 12774, 13391, 13573, 13643, 14298,
14556, 14556, 14648, 14862, 14935, 14935, 14999, 15514, 15811,
16045, 16045, 16190, 16190, 16190, 16220, 16220, 16220, 16220
), contactnummer = c(1, 1, 1, 2, 3, 1, 1, 2, 3, 1, 2, 3, 1, 1,
1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,
1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 3, 1, 2, 3, 99), Combo2 = c(1,
1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1,
2, 4, 4, 1, 5, 1, 1, 1, 1, 3, 3, 1, 5, 1, 1, 3, 1, 1, 1, 1, 1,
3, 6, 3, 1, 1, 1, 1), treatment = c(99, 0, 0, 1, 1, 0, 99, 99,
99, 99, 99, 1, 1, 0, 1, 99, 99, 99, 0, 99, 99, 0, 0, 0, 1, 99,
99, 0, 0, 0, 0, 0, 1, 1, 1, 99, 99, 1, 0, 0, 1, 0, 0, 0, 1, 1,
99, 99, 99, 99)), row.names = c(NA, 50L), class = c("data.table",
"data.frame"))
# A tibble: 50 x 4
pat_id contactnummer Combo2 treatment
<dbl> <dbl> <dbl> <dbl>
1 10037 1 1 99
2 10264 1 1 0
3 10302 1 1 0
4 10302 2 1 1
5 10302 3 2 1
6 10344 1 1 0
7 10482 1 2 99
8 10482 2 1 99
9 10482 3 1 99
10 10613 1 1 99
The dataframe contains information about participants ("pat_id") who visit a GP. In a visit, or contact ("contactnummer"), the GP evaluates the combination of symptoms ("combo2") and gives them a treatment ("treatment"). Some participants (not all) visit the GP for a second (or even third) contact. For each contact the GP will evaluate the symptoms and give them a treatment.
The aim is to illustrates the path of these participants. Which symptoms lead to which treatment and when (what contact). I hope to do this in an sankey diagram.(https://r-graph-gallery.com/321-introduction-to-interactive-sankey-diagram-2.html)
I aim to visualize it like this:
- to visualize each combination of symptoms with a certain color
- to visualize each treatment option (the nodes) with a certain color
Ideally the desired output would look like this:
or this:

I would like to have the combinations ("Combo2") as arrows, showed in different colours per unique combination. These arrows should then lead to a treatment. But then i would like them continue, so after contact 1 - if an ID number has a second contact, the arrow shows again what combinations after that treatment occurs and to what treatment it leads in the second contact.
AFTER EDIT
With help from user s__, I've used the following script
# messing up with data: the goal is to create data.frame
# with source and targets to feed the sankey
df <-
sankey1 %>%
# wide format to gives an order
pivot_wider(id_cols = pat_id
, names_from = contactnummer
, values_from = c(Combo2,treatment)
,names_glue = "{contactnummer}_{.value}"
,names_sort=TRUE) %>%
# put in a long format
pivot_longer(!pat_id, names_to = 'variable', values_to = 'value') %>%
# remove nas
filter(!is.na(value)) %>%
# grouping and creating the source field by pat_id
group_by(pat_id) %>%
mutate(source = paste(substr(variable,1,15),value, sep = '_')) %>%
# useful columns
select(pat_id, source) %>%
# arrange
arrange(pat_id, source) %>%
# adding by group the target column
mutate(target = c(source[2:length(source)],NA))
# define source and target
links <- data.frame(source =df$source,
target =df$target) %>%
filter(!is.na(target))
# getting unique nodes
nodes <- data.frame(name = as.character(unique(c(links$source, links$target))))
# now convert as character
links$source <- as.character(links$source)
links$target<- as.character(links$target)
# matching links and node, then indexing to 0
links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1
# group by (we are grouping by number of rows)
links <- links %>% group_by(source, target) %>% tally()
# plot it!
sankeyNetwork(Links = links
, Nodes = nodes
, Source = 'source'
, Target = 'target'
, Value = 'n'
, NodeID = 'name'
,fontSize = 15)
This comes pretty close, but is not yet the desired output. I've tried editing the source, target and nodes like below, however that definitely isn't the desired output.
df <-
sankey2 %>%
# wide format to gives an order
pivot_wider(id_cols = pat_id
, names_from = contactnummer
, values_from = c(Combo2,treatment)
,names_glue = "{contactnummer}_{.value}"
,names_sort=TRUE) %>%
# put in a long format
pivot_longer(!pat_id, names_to = 'variable', values_to = 'value') %>%
# remove nas
filter(!is.na(value)) %>%
# grouping and creating the source field by pat_id
group_by(pat_id) %>%
mutate(source = paste(substr(variable,1,15),value, sep = '_')) %>%
# useful columns
select(pat_id, source) %>%
# arrange
arrange(pat_id, source) %>%
mutate(number = ave(pat_id, FUN = seq_along)) %>%
# adding by group the target column
pivot_wider(pat_id, values_from = source, names_from = number )#
names(df)[names(df) == '1'] <- 'Combo2_1'
names(df)[names(df) == '2'] <- 'treatment_1'
names(df)[names(df) == '3'] <- 'Combo2_2'
names(df)[names(df) == '4'] <- 'treatment_2'
names(df)[names(df) == '5'] <- 'Combo2_3'
names(df)[names(df) == '6'] <- 'treatment_3'
df <- df %>%
pivot_longer(!pat_id, names_to = c(".value", "contact"), names_sep = "_")
df <- df[!is.na(df$Combo2),]
df <- df %>%
select(pat_id, Combo2, treatment)
names(df)[names(df) == 'Combo2'] <- 'source'
names(df)[names(df) == 'treatment'] <- 'target'
# define source and target
links <- data.frame(source =df$source,
target =df$target) %>%
filter(!is.na(target))
# getting unique nodes
nodes <- data.frame(name = as.character(unique(c(links$source, links$target))))
# now convert as character
links$source <- as.character(links$source)
links$target<- as.character(links$target)
# matching links and node, then indexing to 0
links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1
# group by (we are grouping by number of rows)
links <- links %>% group_by(source, target) %>% tally()
# plot it!
sankeyNetwork(Links = links
, Nodes = nodes
, Source = 'source'
, Target = 'target'
, Value = 'n'
, NodeID = 'name'
,fontSize = 15
)
I really cant figure it out. Any help would be much appreciated!

I came to the conclusion, after also having contact with the current maintainer of the networkD3 package, that the outcome i aimed for was not possible with a sankey diagram.