Calculating running amounts with uneven number of dates

Question

Calculating running amounts with uneven number of dates

37 Views Asked by user23687835 At 26 March 2024 at 12:18

I have a table full of dates and reply-to times for technical support tickets, and I'd like to calculate a running rate to find the average reply time over the previous n number of days. The data is in the following format.

Dates	Reply Time	Ticket ID
2024-01-02	341	1
2024-01-02	31	2
2024-01-03	321	3
2024-01-05	412	4
2024-01-07	93	5
2024-01-07	169	6

I can solve this problem by calculating the average reply time each day and then calculating the average reply time over the previous n number of days from that, but this doesn't take into account the number of observations each day, which will skew the results if there are outliers for certain days. I want to account for the number of observations when I'm averaging to prevent outliers from throwing off the data.

Here I'm using the package runner to get the average reply time each day and calculating a moving average from that.

daily_reply_time <- df_replies %>%
  filter(!is.na(reply_time) & !is.na(dates)) %>%
  group_by(dates) %>%
  reframe(avg_reply_time = mean(reply_time, na.rm = TRUE)) %>%
  mutate(
    x = "x",
    dates = lubridate::ymd(dates)
  ) %>%
  filter(!is.na(dates)) %>%
  complete(
    nesting(x),
    dates = seq(min(dates), max(dates), by = "day")
  ) %>%
  group_by(x) %>%
  arrange(dates) %>%
  mutate(
    dates= lubridate::ymd(dates),
    avg_reply_time = ifelse(is.na(avg_reply_time), 0, as.numeric(avg_reply_time )),
    running_reply_time_30_days = runner::mean_run(x = avg_reply_time, k = 30, idx = dates)
  ) %>%
  select(-x)

I create a dummy variable x so that nesting works properly; I assume there's a way to skip over that I'm unaware of. Anyways, this would give me averages of 186, 321, 0, 412, 0, and 131 for each day, so when I use runner, I get a moving average of 175 on 2024-01-08, instead of the 227.83 that is expected when you just sum up all the numbers and divide by the number of observations.

If I skip grouping by each date and instead use the complete function, I get an error saying that "'from' must be a finite number." And not using complete and trying to use the runner package does not throw an error, but averages over the previous n rows in the dataset instead of the number of dates.

daily_reply_time <- df_replies %>%
  filter(created_at > '2023-12-31') %>%
  mutate(
    created_at = substr(created_at, 1, 10),
    first_reply_time_in_minutes = first_reply_time_in_minutes / 60
  ) %>%
  filter(!is.na(created_at) & !is.na(first_reply_time_in_minutes)) %>%
  mutate(x = "x") %>%
  complete( 
    nesting(x),
    created_at = seq(min(created_at), max(created_at), by = "day")
  )

Is there a way to account for the number of observations when calculating running amounts using runner, or through some other package?

EDIT: The expected output would contain one row for each date, and a moving average reply time over the previous n days, meaning the input and output would have different lengths (amounts are not supposed to match up with the table above, just an example of the expected output given a particular dataframe).

Dates	Moving Avg. Reply Time
2024-01-02	125
2024-01-03	108.3
2024-01-04	108.3
2024-01-05	137
2024-01-06	67
2024-01-07	251

Original Q&A

There are 1 best solutions below

**Johan Rosa** · Accepted Answer · 2024-03-26T13:16:47.930000

To do it I would use the roll... functions from the {zoo} package.

library(dplyr)
library(zoo)

First let’s generate some data. each day in january will be present at least once.

set.seed(123)
january_dates <- seq(as.Date("2024-01-01"), length.out = 31, by = "day")
duplicate_dates <- sample(
  x = seq(as.Date("2024-01-01"), length.out = 31, by = "day"),
  size = 29,
  replace = TRUE
)

data <- data.frame(
  ticket_id = 1:60,
  date = c(january_dates, duplicate_dates),
  reply_time = sample(1:300, size = 60, replace = TRUE)
)

head(data)
#>   ticket_id       date reply_time
#> 1         1 2024-01-01        137
#> 2         2 2024-01-02        254
#> 3         3 2024-01-03        211
#> 4         4 2024-01-04         78
#> 5         5 2024-01-05         81
#> 6         6 2024-01-06         43

Now let’s calculate the total_reply_time and the number_of_tickets by day.

summary <- data |>
  arrange(date) |>
  summarise(
    total_reply_time = sum(reply_time),
    number_of_tickets = n(),
    .by = date
  )

The last step is to get the weighted rolling average

summary |>
  mutate(
    rollsum_reply_time = zoo::rollsum(total_reply_time, k = 7, fill = NA, align = "right"),
    rollsum_tickerts = zoo::rollsum(number_of_tickets, k = 7, fill = NA, align = "right"),
    rolling_average = rollsum_reply_time / rollsum_tickerts
  )
#>          date total_reply_time number_of_tickets rollsum_reply_time
#> 1  2024-01-01              137                 1                 NA
#> 2  2024-01-02              254                 1                 NA
#> 3  2024-01-03              521                 3                 NA
#> 4  2024-01-04               78                 1                 NA
#> 5  2024-01-05              380                 3                 NA
#> 6  2024-01-06               43                 1                 NA
#> 7  2024-01-07              279                 2               1692
#> 8  2024-01-08              117                 2               1672
#> 9  2024-01-09              308                 2               1726
#> 10 2024-01-10              554                 3               1759
#> 11 2024-01-11               27                 2               1708
#> 12 2024-01-12              135                 1               1463
#> 13 2024-01-13              224                 1               1644
#> 14 2024-01-14              448                 3               1813
#> 15 2024-01-15              452                 2               2148
#> 16 2024-01-16              290                 1               2130
#> 17 2024-01-17               69                 1               1645
#> 18 2024-01-18              281                 2               1899
#> 19 2024-01-19              321                 3               2085
#> 20 2024-01-20              132                 2               1993
#> 21 2024-01-21              141                 1               1686
#> 22 2024-01-22              522                 3               1756
#> 23 2024-01-23              153                 1               1619
#> 24 2024-01-24              294                 1               1844
#> 25 2024-01-25              540                 4               2103
#> 26 2024-01-26              231                 3               2013
#> 27 2024-01-27              502                 3               2383
#> 28 2024-01-28              381                 2               2623
#> 29 2024-01-29               83                 2               2184
#> 30 2024-01-30              116                 1               2147
#> 31 2024-01-31              356                 2               2209
#>    rollsum_tickerts rolling_average
#> 1                NA              NA
#> 2                NA              NA
#> 3                NA              NA
#> 4                NA              NA
#> 5                NA              NA
#> 6                NA              NA
#> 7                12        141.0000
#> 8                13        128.6154
#> 9                14        123.2857
#> 10               14        125.6429
#> 11               15        113.8667
#> 12               13        112.5385
#> 13               13        126.4615
#> 14               14        129.5000
#> 15               14        153.4286
#> 16               13        163.8462
#> 17               11        149.5455
#> 18               11        172.6364
#> 19               13        160.3846
#> 20               14        142.3571
#> 21               12        140.5000
#> 22               13        135.0769
#> 23               13        124.5385
#> 24               13        141.8462
#> 25               15        140.2000
#> 26               15        134.2000
#> 27               16        148.9375
#> 28               17        154.2941
#> 29               16        136.5000
#> 30               16        134.1875
#> 31               17        129.9412

^{Created on 2024-03-26 with reprex v2.0.2}

Calculating running amounts with uneven number of dates

There are 1 best solutions below

Related Questions in R

Related Questions in DATE

Related Questions in TIDYVERSE

Related Questions in RUNNER

Trending Questions

Popular # Hahtags

Popular Questions