Is there an elegant way of filling in missing time periods as timetk::pad_by_time
and tsibble::fill_gaps
in data.table
?
The data might look like this
library(data.table)
data<-data.table(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-03-01","2020-03-01","2020-03-01"),
Card = c(1,2,3,1,3,1,2,3),
A = rnorm(8)
)
The implicitly missing observation of card 2 at 2020-02-01.
In tsibble
package, you can do the following
library(tsibble)
data <- data[, .(Date = yearmonth(ymd(Date)),
Card = as.character(Card),
A= as.numeric(A))]
data<-as_tsibble(data, key = Card, index = Date)
data<-fill_gaps(data)
In timetk
package, you can do the following
library(timetk)
data <- data[, .(Date = ymd(Date),
Card = as.character(Card),
A= as.numeric(A))]
data<-data %>%
group_by(Card) %>%
pad_by_time(Date, .by = "month") %>%
ungroup()
Just
data.table
:If no key is set, then
(updated/simplified, thanks to @sindri_baldur!)
If a key is set, then you can use @Frank's method:
And from here, you can use
nafill
as desired, perhaps(How to fill is based on your knowledge of the context of the data; it might just as easily be
by=.(Date)
, or some form of imputation.)Update: the above does an expansion of possible combinations, which might fill outside of a particular
Card
's span, in which case one might see:I think there are two approaches to this:
Doing the above code and then removing leading (and trailing)
NA
s per group:Completely different approach (assuming
Date
-class, not strictly required above):