I'm new to R so open to any suggestions for improvement.
I'm importing some financial data and checking for missing data. One way I have come up with doing this is by creating a POSIXct vector and manipulating this into a comparable format. Please see code below;
DateTimeC = seq(as.POSIXct("1986/12/1"), as.POSIXct("1986/12/2"), "mins")
DateTimeC = format(as.POSIXct(DateTimeC,format='%Y.%m.%d %H:%M'),format='%Y.%m.%d %H:%M')
DateTimeC = data.frame(DateTimeC)
colnames(DateTimeC) = c('DateTime')
The above creates the list I require for my test period. I then compare then test against the imported and delete any matches;
DataDelete = DateTimeC[!DateTimeC$DateTime %in% DateTime$DateTime, ]
DataError = data.frame(DataDelete)
colnames(DataError) = c('DateTime')
The next stage I've got to is creating data frames for the Xmas Holiday and the New Year holiday. I then compare these data.frames to the data and delete any matches, it follows the exact same process as the above, only, the Date & Time Data frames are of the Xmas & New Year period.
The problem I have here is that the data set I have is over 28 years. I would need to repeat the above process 56 times to get the desired result.
Questions
- Is there something in the
as.POSIXct
function / structure that will allow me to specify I want the date and time, by minute, that day, of every year from X - Y. Or will I have to do this manually? - Does anyone have an elegant solution to this problem?
Technically, there are 24 * 60 = 1440 minutes in each day. The ISO8601 standard defines 00:00 to be the initial moment of a new date. Unless the legacy code you are matching also allocates 1441 minutes to the timeslices of interest, you may wish to adjust your seq() call. In what follows I assume that this simplification will be acceptable.
To start with, your existing code could be written a little more concisely:
Notice that you can specify the column name directly in the data.frame call:
If you wanted to automate over a set of days within years you could do something like this:
You may also want to look into the *apply family of functions (see ?lapply) which arguably provide a more elegant solution but require you to be comfortable manipulating list objects.