Convert Daily Data into Weekly Data and summarize multiple columns in R

1.1k Views Asked by At

I want to change the following data set :

date          A   B
01/01/2018  391 585
02/01/2018  420 595
03/01/2018  455 642
04/01/2018  469 654
05/01/2018  611 900
06/01/2018  449 640
07/01/2018  335 522
08/01/2018  726 955
09/01/2018  676 938
10/01/2018  508 740
11/01/2018  562 778
12/01/2018  561 761
13/01/2018  426 609
14/01/2018  334 508

My desired output is as follows:

date           A       B
07/01/2018  3130    4538
14/01/2018  3793    5289

In which, the numbers of Columns A and B are summed over seven days of weeks. Indeed, I want to convert daily data to weekly data. I found two solutions on the Stackoverflow website. One solution would be using library(tidyquant), and following code

library(tidyquant)
newfd<-df %>%
  tq_transmute(select     = A,
               mutate_fun = apply.weekly,
               FUN        = sum)

The code produces weekly data for Column A while I need all columns. (I have many columns). I also used the following code. But, I don’t know how I can develop the code for all columns.

library(slider)   
slide_period_dfr(.x = califo, .i=as.Date(califo$date), 
                 .period = "week", 
                 .f = ~data.frame(week_ending = tail(.x$ date,1),
                                  week_freq = sum(.x$A)),
                 .origin = as.Date("2018-01-01"))
3

There are 3 best solutions below

0
On BEST ANSWER

You can compute an index id = 0:(nrow(df) - 1), once the series is arranged by date, and use that to define to which period (week) belongs each date week = id %/% 7. Date linked to each week is chosen in date = max(date) to be the last date of the week. Other options are possible.

library(dplyr)
library(lubridate)

df <- tribble(~date, ~A, ~B,
"01/01/2018", 391, 585,
"02/01/2018", 420, 595,
"03/01/2018", 455, 642,
"04/01/2018", 469, 654,
"05/01/2018", 611, 900,
"06/01/2018", 449, 640,
"07/01/2018", 335, 522,
"08/01/2018", 726, 955,
"09/01/2018", 676, 938,
"10/01/2018", 508, 740,
"11/01/2018", 562, 778,
"12/01/2018", 561, 761,
"13/01/2018", 426, 609,
"14/01/2018", 334, 508)

df %>%
  mutate(date = dmy(date)) %>% 
  arrange(date) %>% 
  mutate(id = 0:(nrow(df) - 1), week = id %/% 7) %>%
  group_by(week) %>% 
  summarize(date = max(date), across(A:B, sum))

#> # A tibble: 2 x 4
#>    week date           A     B
#>   <dbl> <date>     <dbl> <dbl>
#> 1     0 2018-01-07  3130  4538
#> 2     1 2018-01-14  3793  5289

Created on 2021-06-05 by the reprex package (v0.3.0)

0
On

You could pivot_longer() so that you have only one column of data to transform, apply the function to that column, and then pivot_wider().

Here is a trivial example with mtcars:

library(tidyverse)
mtcars %>%
  rownames_to_column(var = "car") %>% 
  select(car, mpg, cyl) %>% 
  pivot_longer(cols = c(mpg, cyl), names_to = "var") %>% 
  mutate(value = value^2) %>% 
  pivot_wider(names_from = var, names_prefix = "squared_")

# A tibble: 32 x 3
   car               squared_mpg squared_cyl
   <chr>                   <dbl>       <dbl>
 1 Mazda RX4                441           36
 2 Mazda RX4 Wag            441           36
 3 Datsun 710               520.          16
 4 Hornet 4 Drive           458.          36
 5 Hornet Sportabout        350.          64
 6 Valiant                  328.          36
 7 Duster 360               204.          64
 8 Merc 240D                595.          16
 9 Merc 230                 520.          16
10 Merc 280                 369.          36
# … with 22 more rows

Your aggregation would replace my mutate step.

Whether this is more concise than repeatedly creating new variables depends on how many variables you are dealing with.

2
On

You can use ceiling_date to make the dates to weekly dates and sum multiple variables with across in dplyr.

library(dplyr)
library(lubridate)

df %>%
  group_by(date = ceiling_date(dmy(date), 'week', week_start = 1)) %>%
  summarise(across(A:B, sum))

#  date           A     B
#  <date>     <int> <int>
#1 2018-01-08  3130  4538
#2 2018-01-15  3793  5289

data

df <- structure(list(date = c("01/01/2018", "02/01/2018", "03/01/2018", 
"04/01/2018", "05/01/2018", "06/01/2018", "07/01/2018", "08/01/2018", 
"09/01/2018", "10/01/2018", "11/01/2018", "12/01/2018", "13/01/2018", 
"14/01/2018"), A = c(391L, 420L, 455L, 469L, 611L, 449L, 335L, 
726L, 676L, 508L, 562L, 561L, 426L, 334L), B = c(585L, 595L, 
642L, 654L, 900L, 640L, 522L, 955L, 938L, 740L, 778L, 761L, 609L, 
508L)), class = "data.frame", row.names = c(NA, -14L))