I have the following data structure:
> dt
ID MiscInfo Date Val
1: A info_a 2000-01-01 0
2: A info_a 2000-01-03 3
3: B info_b 2001-01-01 1
4: B info_b 2001-01-04 5
5: B info_b 2001-01-07 13
Where Date
has some missing ID-wise entries where Val == 0
and MiscInfo
represents a set of N > 50
attribute variables. My end goal is to fill the missing entries such that the below structure is obtained.
> dt_pref
ID MiscInfo Date Val
1: A info_a 2000-01-01 0
2: A info_a 2000-01-02 0
3: A info_a 2000-01-03 3
4: B info_b 2001-01-01 1
5: B info_b 2001-01-02 0
6: B info_b 2001-01-03 0
7: B info_b 2001-01-04 5
8: B info_b 2001-01-05 0
9: B info_b 2001-01-06 0
10: B info_b 2001-01-07 13
Judging from similar requests, a rolling join is a decent route in achieving this. The issue I encounter is the inability to select which columns to roll, as described below:
drange = dt[, .(Date = seq(min(Date), max(Date), 1)), ID] %>% setkey(ID, Date)
dt[drange, roll = T]
ID MiscInfo Date Val
1: A info_a 2000-01-01 0
2: A info_a 2000-01-02 0
3: A info_a 2000-01-03 3
4: B info_b 2001-01-01 1
5: B info_b 2001-01-02 1
6: B info_b 2001-01-03 1
7: B info_b 2001-01-04 5
8: B info_b 2001-01-05 5
9: B info_b 2001-01-06 5
10: B info_b 2001-01-07 13
In this case, the MiscInfo
columns are appropriately rolled to my satisfaction, however the Val columns are of course also rolled whilst I wish to set them equal to 0. I can of course also take a step in the other direction by passing roll = 0
:
dt[drange, roll = 0]
ID MiscInfo Date Val
1: A info_a 2000-01-01 0
2: A <NA> 2000-01-02 NA
3: A info_a 2000-01-03 3
4: B info_b 2001-01-01 1
5: B <NA> 2001-01-02 NA
6: B <NA> 2001-01-03 NA
7: B info_b 2001-01-04 5
8: B <NA> 2001-01-05 NA
9: B <NA> 2001-01-06 NA
10: B info_b 2001-01-07 13
In this case, I can of course apply something like dt[is.na(Val), Val := 0]
, however handling the NA-entries of the MiscInfo
array (which is very large) using a similar route is not computationally efficient, and I suspect that there is a join-related way of carrying out this operation. In short, I want to preset Val to 0 for the "filled" entries, and roll the remaining columns in an efficient manner. Any ideas?
Reproducibles:
dt = data.table(
ID = c('A', 'A', 'B', 'B', 'B'),
MiscInfo = c(rep('info_a', 2), rep('info_b', 3)),
Date = as.Date(c('2000-01-01', '2000-01-03', '2001-01-01', '2001-01-04', '2001-01-07')),
Val = c(0,3,1,5,13)
) %>% setkey(ID, Date)
dt_pref = data.table(
ID = c(rep('A', 3), rep('B', 7)),
MiscInfo = c(rep("info_a", 3), rep("info_b", 7)),
Date = as.Date(c(10957, 10958, 10959, 11323, 11324, 11325, 11326, 11327, 11328, 11329), origin = '1970-01-01'),
Val = c(0, 0, 3, 1, 0, 0, 5, 0, 0, 13)
)
May be this could be used, even for more complex cases :