Prepare date data for dumbbell plot

Question

Prepare date data for dumbbell plot

78 Views Asked by TarJae At 28 July 2025 at 21:07

I have a dataset that presents a few challenges for transformation in preparation for creating a dumbbell plot:

Single Date Groups: Some groups have only one date. In these cases, the start and end dates are the same, and h_sequ is 1.
Two Date Groups: Other groups have a clear start and end date, signified by h_sequ values of 1 and 2. An example of this is group 12.
Three Date Groups: There are also groups with three dates, where h_sequ takes values 1, 2, and 3, such as group 33.
And also in group 33 there is a unique case where h_sequ has values of 1, 1, 2, 3.

 group h_sequ date      
   <int>  <int> <date>    
 1     1      1 2012-03-27
 2     1      1 2012-03-27
 3    10      1 2016-10-25
 4    10      1 2016-10-25
 5    12      1 2021-06-25
 6    12      2 2022-05-18
 7    31      1 2019-11-28
 8    31      1 2019-11-28
 9    31      2 2021-03-24
10    33      1 2013-09-03
11    33      1 2013-09-03
12    33      2 2019-01-04
13    33      3 2020-07-28
14    35      1 2015-10-21
15    35      2 2017-06-28

data <- structure(list(group = c(1L, 1L, 10L, 10L, 12L, 12L, 31L, 31L, 
31L, 33L, 33L, 33L, 33L, 35L, 35L), h_sequ = c(1L, 1L, 1L, 1L, 
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 3L, 1L, 2L), date = structure(c(15426, 
15426, 17099, 17099, 18803, 19130, 18228, 18228, 18710, 15951, 
15951, 17900, 18471, 16729, 17345), class = "Date")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L))

The main question is how to implement the logic for the date column to accommodate these scenarios in a combined dumbbell plot. So far, I have used summarization to get the minimum and maximum dates for each group, but I need to integrate this approach with the specific structure of my data, taking into account the varying number of dates per group.

So far I have this:

library(ggplot2)
library(ggalt)
library(dplyr)

data %>%
  summarise(start_date = min(date), end_date = max(date), .by = group) %>%
  ggplot(aes(x = start_date, xend = end_date, y = group)) +
  geom_dumbbell(color = "red3", size = 3)

Original Q&A

There are 2 best solutions below

Gregor Thomas On 04 January 2024 at 20:45

I would suggest building your own dumbbells using geom_segment and geom_point:

ends = data |> summarise(start_date = min(date), end_date = max(date), .by = group)
data = data |> mutate(nudge = (row_number() - 1) / 2, .by = c(group, h_sequ))

ggplot(ends, aes(y = group)) +
  geom_segment(
    aes(x = start_date, xend = end_date, y = group, yend = group),
    color = "red3", linewidth = 3
  ) +
  geom_point(
    data = data,
    aes(x = date, y = group + nudge, fill = factor(h_sequ)), 
    color = "gray20",
    shape = 21,
    alpha = 0.9,
    size = 4
  )

**Allan Cameron** · Accepted Answer

I would probably manually dodge the co-occurring points, and join the points with geom_path. This allows a complete display of all your data.

library(tidyverse)

data %>% 
  mutate(group = factor(group)) %>%
  mutate(dodge = (row_number() - median(row_number()))/n()/3.2, 
                  .by = c(group, date)) %>%
  ggplot(aes(date, group)) +
  geom_path(linewidth = 3, color = "gray") +
  geom_point(aes(y = as.numeric(group) + dodge, fill = factor(h_sequ)), 
             shape = 21, size = 5) +
  scale_fill_manual("h_sequ", values = c("orange", "deepskyblue4", "red4")) +
  theme_minimal(base_size = 16)

Prepare date data for dumbbell plot

There are 2 best solutions below

Related Questions in R

Related Questions in GGPLOT2

Related Questions in GGALT

Trending Questions

Popular # Hahtags

Popular Questions