Group Time Series OHLC Data by chosen period in R

179 Views Asked by At

There exist several functions in R in libraries xts and zoo, which try to aggregate financial OHLC(V) data from lower to higher granularities, as well as the newcomer tibbletime::to_period, which performs the same task for a tibble. All of them, however, suffer from the same inefficiency: When aggregating by, let us say, one hour, they take the round times as the start and end points of the intervals, i.e. boundaries would be 8 AM, 9 AM, 10 AM,... If I have data with 15 min candles, how can I aggregate OHLC(V), so that it is aggregated by 1 H intervals, not by the round times?

Time <- seq(from = as.POSIXct("2018-12-28 12:00:00"), to = as.POSIXct("2019-01-02 13:30:00"), by = 900)

Price_Data <- tibble::tibble(Time = Time, 
                             Open = 100 + rnorm(n = length(Time)), 
                             High = 100 + rnorm(n = length(Time)),                   
                             Low = 100 + rnorm(n = length(Time)),                   
                             Close = 100 + rnorm(n = length(Time)),                   
                             Volume = rpois(n = length(Time), lambda = 5000))
tail(Price_Data)

1 2019-01-02 12:15:00  99.7   5074
2 2019-01-02 12:30:00  99.9   4925
3 2019-01-02 12:45:00 101.    5070
4 2019-01-02 13:00:00  98.6   4919
5 2019-01-02 13:15:00  98.6   4925
6 2019-01-02 13:30:00  99.5   5046

How can I aggragate the above tibble to 30M, 1H, 2H and 4H, so that the groups will of the desired length? For example, the last group in aggregating by 1H would take the 4 candles from 12:45:00 to 13:30:00, 2H from 11:45:00, ... I have tried

purrr::map(c("30 M","1 H","2 H","4 H")), function(Period) Price_Data %>%
          na.omit() %>% tibbletime::tbl_time(., index = Time) %>%
          tibbletime::collapse_by(Period, side = "end", clean = T) %>%
          dplyr::group_by(Time) %>%
          dplyr::mutate(Open = dplyr::first(Open),
                        High = max(High),
                        Low  = min(Low),
                        Close = dplyr::last(Close),
                        Volume = sum(Volume)) %>%
          dplyr::slice(n = n()) %>% dplyr::ungroup())

with various combinations of parameters, but nothing produces the desired result. Also, grouping by the number of candles in specific interval does not help, as real world data has gaps.

0

There are 0 best solutions below