I'm currently trying to incorporate day of the year (DOY) into my randomforest model in R using sine and cosine transformations. The reason I'm not simply using DOY is because I'd like the model to understand December 31st and January 1st are similar, which I don't believe will be properly conveyed with values of 1 and 365. I can mimic seasonality using sine or cosine to some extent, but run into the problem of multiple y values for sin(DOY) = y (i.e. a value of zero occurs on two dates if I use only a sine or cosine transformation). This can lead to a date in summer and winter receiving the same sin(DOY) despite being very different. Is there a way to include a sine and cosine pair as a single feature (i.e. (sin(DOY), cos(DOY))? Or perhaps there's another way to include the DOY into the model?
My current code is as follows:
dfSensor$DOYSin <- sin((dfSensor$DOY-173) * (2*pi)/365.25)
Where day 173 corresponds to June 22nd. The produces a value of +1 around September 21st and -1 around March 21st, but June 22nd and December 22nd are both around a value of 0. This issue will occur no matter what kind of shift I use for the day. However, I think adding a cosine column to my dataframe and combining the sine and cosine transformation into one feature might help the issue.