Date class vector to fill missing time series dataframe R

99 Views Asked by At

Been working on trying to get the time series filled in here. I'm trying to clean up a dataset based on snowfall from October through May, where a "snowyear" starts each October (i.e. October 1954 through May 1955 snowyear = "1954") The complete time series includes everyday from 1954-10-1 through today.

man<-data.table()
man <-  read.delim('mansfieldstake.txt',header=TRUE, check.names=FALSE)
man[is.na(man)]<-0
man$date<-paste(man$yy, man$mm, man$dd,sep="-", collapse=NULL)
man$date <- as.Date(man$date,format= "%Y-%m-%d")  
colnames(man)<- c("date","month","day","year","depth","snowyear","snowday)

get a dataframe/table like this, with some dates missing (or not having a man$depth = 0 where no measurement was taken that day):

mm dd   yy depth       date snowyear snowday
12 22 1954    24 1954-12-22       NA      NA
12 23 1954    24 1954-12-23       NA      NA
12 24 1954    24 1954-12-24       NA      NA
12 25 1954    30 1954-12-25       NA      NA
12 26 1954    36 1954-12-26       NA      NA
12 27 1954     0 1954-12-27       NA      NA
12 28 1954    36 1954-12-28       NA      NA
12 29 1954    30 1954-12-29       NA      NA
12 30 1954     0 1954-12-30       NA      NA
12 31 1954     0 1954-12-31       NA      NA
 1  1 1955     0 1955-01-01       NA      NA
 1  3 1955    36 1955-01-03       NA      NA
 1  4 1955    36 1955-01-04       NA      NA
 1  6 1955    36 1955-01-06       NA      NA

this is to create a date vector for all dates that should be in the time series. hung up here- basically, I need to find the values in daily_vector not in man$date... I'm all over the place here:

daily_vector <- seq(as.Date("1954-10-01"), as.Date("2016-12-12"), by="days")
missing_datetest <- !daily_vector %in% man$date
missingdates<- daily_vector[missing_datetest]
missingdates

Looking to pull out a vector of these missing dates, and then merge (or join?) them with man$date, inserting empty rows for man$depth, which will then be averaged or removed depending on their value (e.g. if man$depth for the days around a missing time series was 60 (inches) of snow, it would be the average of the ~2 days before & after- as the snow didn't disappear for a day).

the date class needs to match, and then merging a vector with a dataframe (or data.table?) seems to be giving me problems. It looks like I'm just missing something fundamental here.

once that is done, I want to assign a column with value "snowday" starting october 1st of each man$snowyear" and then use these to plot a time series October through May for each "snowyear". I can do this with a loop, but can it be done with a faster function approach? or drop man$snowday if there is a way to plot such an inter-year time range in ggplot. Anybody have some insight?

0

There are 0 best solutions below