I have a dataset that presents a few challenges for transformation in preparation for creating a dumbbell plot:
- Single Date Groups: Some groups have only one date. In these cases, the start and end dates are the same, and
h_sequ
is1
. - Two Date Groups: Other groups have a clear start and end date, signified by
h_sequ
values of1
and2
. An example of this is group 12. - Three Date Groups: There are also groups with three dates, where
h_sequ
takes values 1, 2, and 3, such as group 33. - And also in group 33 there is a unique case where
h_sequ
has values of1, 1, 2, 3
.
group h_sequ date
<int> <int> <date>
1 1 1 2012-03-27
2 1 1 2012-03-27
3 10 1 2016-10-25
4 10 1 2016-10-25
5 12 1 2021-06-25
6 12 2 2022-05-18
7 31 1 2019-11-28
8 31 1 2019-11-28
9 31 2 2021-03-24
10 33 1 2013-09-03
11 33 1 2013-09-03
12 33 2 2019-01-04
13 33 3 2020-07-28
14 35 1 2015-10-21
15 35 2 2017-06-28
data <- structure(list(group = c(1L, 1L, 10L, 10L, 12L, 12L, 31L, 31L,
31L, 33L, 33L, 33L, 33L, 35L, 35L), h_sequ = c(1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 3L, 1L, 2L), date = structure(c(15426,
15426, 17099, 17099, 18803, 19130, 18228, 18228, 18710, 15951,
15951, 17900, 18471, 16729, 17345), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L))
The main question is how to implement the logic for the date column to accommodate these scenarios in a combined dumbbell plot. So far, I have used summarization to get the minimum and maximum dates for each group, but I need to integrate this approach with the specific structure of my data, taking into account the varying number of dates per group.
So far I have this:
library(ggplot2)
library(ggalt)
library(dplyr)
data %>%
summarise(start_date = min(date), end_date = max(date), .by = group) %>%
ggplot(aes(x = start_date, xend = end_date, y = group)) +
geom_dumbbell(color = "red3", size = 3)
I would probably manually dodge the co-occurring points, and join the points with
geom_path
. This allows a complete display of all your data.