I am looking to create a basic Sankey diagram using ggalluvial
that breaks down a set of observations by an edit that is made. Please consider the following example:
library(data.table)
library(ggalluvial)
data <- data.table(
"shiftid" = c(1,1,2,2,3,3,4,4,5,5,6,6),
"version" = c(1,2,1,2,1,2,1,2,1,2,1,2),
"employee" = c("A", "B", "C", "C", "D", "D", "E", "F", "G", "H", "I", NA),
"starttime" = c(1,1,2,3,4,6,7,7,8,9,9,NA))
data_wide <- dcast(data, shiftid ~ version, value.var = c("employee", "starttime"))
data_wide[, `:=`(employee_change = fifelse(employee_1 != employee_2,1,0),
starttime_change = fifelse(starttime_1 != starttime_2,1,0))]
data_wide
looks like:
shiftid employee_1 employee_2 starttime_1 starttime_2 employee_change
<num> <char> <char> <num> <num> <num>
1: 1 A B 1 1 1
2: 2 C C 2 3 0
3: 3 D D 4 6 0
4: 4 E F 7 7 1
5: 5 G H 8 9 1
6: 6 I <NA> 9 NA NA
starttime_change
<num>
1: 0
2: 1
3: 1
4: 0
5: 1
6: NA
What I want is at the shift-level, to show how many shifts had characteristics changed. So the first "block" woul be all 5 shifts. I want these observations to be broken down into: (1) No change, (2) Deleted (the very last row where the second version does not exist) (3) Change to start time, (4) Change to the employee. Here, a shift can change inmultiple characteristics, so I would like that to be reflected in the Sankey diagram.
Thanks!