I have these three intervals defined:
YEAR_1 <- interval(ymd('2002-09-01'), ymd('2003-08-31'))
YEAR_2 <- interval(ymd('2003-09-01'), ymd('2004-08-31'))
YEAR_3 <- interval(ymd('2004-09-01'), ymd('2005-08-31'))
(in real life, I have 50 of these)
I have a dataframe (called df
) with a column full of lubridate formatted dates.
I'd like to append a new column on df
which has the appropriate value YEAR_n
, depending on which interval the date falls within.
Something like:
df$YR <- ifelse(df$DATE %within% YEAR_1, 1, NA)
but I'm not sure how to proceed. I need to somehow use an apply
I think?
Here's my dataframe:
structure(c(1055289600, 1092182400, 1086220800, 1074556800, 1109289600,
1041897600, 1069200000, 1047427200, 1072656000, 1048636800, 1092873600,
1090195200, 1051574400, 1052179200, 1130371200, 1242777600, 1140652800,
1137974400, 1045526400, 1111104000, 1073952000, 1052870400, 1087948800,
1053993600, 1039564800, 1141603200, 1074038400, 1105315200, 1060560000,
1072051200, 1046217600, 1107129600, 1088553600, 1071619200, 1115596800,
1050364800, 1147046400, 1083628800, 1056412800, 1159747200, 1087257600,
1201478400, 1120521600, 1066176000, 1034553600, 1057622400, 1078876800,
1010880000, 1133913600, 1098230400, 1170806400, 1037318400, 1070409600,
1091577600, 1057708800, 1182556800, 1091059200, 1058227200, 1061337600,
1034121600, 1067644800, 1039478400, 1022198400, 1063065600, 1096329600,
1049760000, 1081728000, 1016150400, 1029801600, 1059350400, 1087257600,
1181692800, 1310947200, 1125446400, 1057104000, NA, 1085529600,
1037664000, 1091577600, 1080518400, 1110758400, 1092787200, 1094601600,
1169424000, 1232582400, 1058918400, 1021420800, 1133136000, 1030320000,
1060732800, 1035244800, 1090800000, 1129161600, 1055808000, 1060646400,
1028678400, 1075852800, 1144627200, 1111363200, 1070236800), class = c("POSIXct",
"POSIXt"), tzone = "UTC")
Everybody has their favourite tool for this, mine happens to be data.table because of what it refers to as its
dt[i, j, by]
logic.I create a
data.table
object, converting your times to date for later comparison. I then set up a new column, defaulting to one.We then execute three conditional statements: for each of the three intervals (which I just create by hand using the endpoints), we set the
YR
value to 1, 2 or 3.This does have the desired effect as we can see from
One could have done this also simply by computing date differences and truncating down, but it is also nice to be a little explicit at times.
Edit: A more generic form just uses arithmetic on the dates:
This does the job in one line.