Been working on trying to get the time series filled in here. I'm trying to clean up a dataset based on snowfall from October through May, where a "snowyear" starts each October (i.e. October 1954 through May 1955 snowyear = "1954") The complete time series includes everyday from 1954-10-1 through today.
man<-data.table()
man <- read.delim('mansfieldstake.txt',header=TRUE, check.names=FALSE)
man[is.na(man)]<-0
man$date<-paste(man$yy, man$mm, man$dd,sep="-", collapse=NULL)
man$date <- as.Date(man$date,format= "%Y-%m-%d")
colnames(man)<- c("date","month","day","year","depth","snowyear","snowday)
get a dataframe/table like this, with some dates missing (or not having a man$depth = 0 where no measurement was taken that day):
mm dd yy depth date snowyear snowday
12 22 1954 24 1954-12-22 NA NA
12 23 1954 24 1954-12-23 NA NA
12 24 1954 24 1954-12-24 NA NA
12 25 1954 30 1954-12-25 NA NA
12 26 1954 36 1954-12-26 NA NA
12 27 1954 0 1954-12-27 NA NA
12 28 1954 36 1954-12-28 NA NA
12 29 1954 30 1954-12-29 NA NA
12 30 1954 0 1954-12-30 NA NA
12 31 1954 0 1954-12-31 NA NA
1 1 1955 0 1955-01-01 NA NA
1 3 1955 36 1955-01-03 NA NA
1 4 1955 36 1955-01-04 NA NA
1 6 1955 36 1955-01-06 NA NA
this is to create a date vector for all dates that should be in the time series. hung up here- basically, I need to find the values in daily_vector not in man$date... I'm all over the place here:
daily_vector <- seq(as.Date("1954-10-01"), as.Date("2016-12-12"), by="days")
missing_datetest <- !daily_vector %in% man$date
missingdates<- daily_vector[missing_datetest]
missingdates
Looking to pull out a vector of these missing dates, and then merge (or join?) them with man$date, inserting empty rows for man$depth, which will then be averaged or removed depending on their value (e.g. if man$depth for the days around a missing time series was 60 (inches) of snow, it would be the average of the ~2 days before & after- as the snow didn't disappear for a day).
the date class needs to match, and then merging a vector with a dataframe (or data.table?) seems to be giving me problems. It looks like I'm just missing something fundamental here.
once that is done, I want to assign a column with value "snowday" starting october 1st of each man$snowyear" and then use these to plot a time series October through May for each "snowyear". I can do this with a loop, but can it be done with a faster function approach? or drop man$snowday if there is a way to plot such an inter-year time range in ggplot. Anybody have some insight?