Can a tsibble obj. have multiple rows of the same date and associated row value?

954 Views Asked by At

I have a data frame that I am converting into a tsibble time series object to allow for easier timeseries graphing and manipulation (rolling time window analysis) of data. I obtain new data daily that I would like to append on to the original data frame represented as df, new incoming data is represented as df2. I can change these data.frame's into a tsibble objects independently, but when I use rbind() to join them first and then use as_tsibble, I get an error.

as_tsibble(final_df, index = date, key = ticker)

Error: A valid tsibble must have distinct rows identified by key and index.
i Please use duplicates() to check the duplicated rows.

To set up the problem here is the code for a reprex.

df <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
             buy_price = c(62.00, 68.00, 37.00, 55.00, 41.00),
             sale_price = c(64.00, 71.00, 42.00, 60.00, 45.00),
             close_price = c(63.00, 70.00, 38.00, 56.00, 43.00),
             date = c(as.Date("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")))

df2 <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
                 buy_price = c(63.00, 69.00, 38.00, 53.00, 44.00),
                 sale_price = c(66.00, 77.00, 47.00, 63.00, 48.00),
                 close_price = c(65.00, 74.00, 39.00, 55.00, 45.00),
                 date = c(as.Date("April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021")))

final_df <- rbind(df,df2)
str(final_df)
> 'data.frame': 10 obs. of  5 variables:

as_tsibble(final_df, index = date, key = ticker)

Upon running the code as_tsibble(final_df, index = date, key = ticker), the order also is changed to be alphabetical, whereas I would like to preserve original order(another question).

I am unable to create a tsibble with final_df, although a tsibble can be created individually on df and df2.

Am I missing something or is it impossible to have a tsibble object with multiple rows of the same ticker name?

1

There are 1 best solutions below

1
On BEST ANSWER

A tsibble must have a unique time point (the index) for each observation in a time series, where each time series is identified by the key.

The datasets that you have constructed for your MRE appear to have this quality, however the conversion to date is not giving you the desired results. For example, your index variable in df is:

as.Date("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")
#> [1] "2021-05-06"

To correctly parse "April 29th, 2021" you could use the {lubridate} package's mdy() function:

lubridate::mdy("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")
#> [1] "2021-04-29" "2021-04-29" "2021-04-29" "2021-04-29" "2021-04-29"

Fixing the parsing of dates, the issue is resolved and we are able to create the tsibble.

library(tsibble)
library(lubridate)
df <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
                 buy_price = c(62.00, 68.00, 37.00, 55.00, 41.00),
                 sale_price = c(64.00, 71.00, 42.00, 60.00, 45.00),
                 close_price = c(63.00, 70.00, 38.00, 56.00, 43.00),
                 date = mdy(c("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")))

df2 <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
                  buy_price = c(63.00, 69.00, 38.00, 53.00, 44.00),
                  sale_price = c(66.00, 77.00, 47.00, 63.00, 48.00),
                  close_price = c(65.00, 74.00, 39.00, 55.00, 45.00),
                  date = mdy(c("April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021")))

final_df <- rbind(df,df2)
as_tsibble(final_df, index = date, key = ticker)
#> # A tsibble: 10 x 5 [1D]
#> # Key:       ticker [5]
#>    ticker buy_price sale_price close_price date      
#>    <chr>      <dbl>      <dbl>       <dbl> <date>    
#>  1 AAPL          37         42          38 2021-04-29
#>  2 AAPL          38         47          39 2021-04-30
#>  3 BNO           41         45          43 2021-04-29
#>  4 BNO           44         48          45 2021-04-30
#>  5 SPX           55         60          56 2021-04-29
#>  6 SPX           53         63          55 2021-04-30
#>  7 UST10Y        62         64          63 2021-04-29
#>  8 UST10Y        63         66          65 2021-04-30
#>  9 UST2Y         68         71          70 2021-04-29
#> 10 UST2Y         69         77          74 2021-04-30

Created on 2021-05-06 by the reprex package (v1.0.0)