Can I create a POSIXct data.frame for a day, in minute units, for that day, each year, over 28 years?

344 Views Asked by At

I'm new to R so open to any suggestions for improvement.

I'm importing some financial data and checking for missing data. One way I have come up with doing this is by creating a POSIXct vector and manipulating this into a comparable format. Please see code below;

DateTimeC = seq(as.POSIXct("1986/12/1"), as.POSIXct("1986/12/2"), "mins")   
DateTimeC = format(as.POSIXct(DateTimeC,format='%Y.%m.%d %H:%M'),format='%Y.%m.%d %H:%M')
DateTimeC = data.frame(DateTimeC)
colnames(DateTimeC) = c('DateTime') 

The above creates the list I require for my test period. I then compare then test against the imported and delete any matches;

DataDelete = DateTimeC[!DateTimeC$DateTime %in% DateTime$DateTime, ] 
DataError = data.frame(DataDelete)
colnames(DataError) = c('DateTime') 

The next stage I've got to is creating data frames for the Xmas Holiday and the New Year holiday. I then compare these data.frames to the data and delete any matches, it follows the exact same process as the above, only, the Date & Time Data frames are of the Xmas & New Year period.

The problem I have here is that the data set I have is over 28 years. I would need to repeat the above process 56 times to get the desired result.

Questions

  • Is there something in the as.POSIXct function / structure that will allow me to specify I want the date and time, by minute, that day, of every year from X - Y. Or will I have to do this manually?
  • Does anyone have an elegant solution to this problem?
2

There are 2 best solutions below

0
On BEST ANSWER

Technically, there are 24 * 60 = 1440 minutes in each day. The ISO8601 standard defines 00:00 to be the initial moment of a new date. Unless the legacy code you are matching also allocates 1441 minutes to the timeslices of interest, you may wish to adjust your seq() call. In what follows I assume that this simplification will be acceptable.

To start with, your existing code could be written a little more concisely:

ts  <- seq(as.POSIXct('1986/12/1 00:00'), as.POSIXct('1986/12/1 23:59'), 'mins')
dtc <- data.frame(DateTime=strftime(ts, format='%Y.%m.%d %H:%M'))

Notice that you can specify the column name directly in the data.frame call:

de  <- data.frame(DateTime=dtc[!dtc$DateTime %in% dt$DateTime,])

If you wanted to automate over a set of days within years you could do something like this:

for (year in seq(1986,2014))
    for (day in c('1/1','12/1','12/25')) {
        dd  <- paste(year,day,sep='/')
        ts  <- seq(as.POSIXct(paste(dd,'00:00')), as.POSIXlt(paste(dd,'23:59')), 'mins')
        dtc <- data.frame(DateTime=strftime(ts, format='%Y.%m.%d %H:%M'))
        de  <- data.frame(DateTime=dtc[!dtc$DateTime %in% dt$DateTime,])

        ... further processing here ...
}

You may also want to look into the *apply family of functions (see ?lapply) which arguably provide a more elegant solution but require you to be comfortable manipulating list objects.

0
On

You could use something like the following to create all the dates for different years as required:

DateList <- lapply(1999:2010, function(year){ 
                      seq(as.POSIXct(paste0(year, "/12/1")),
                          as.POSIXct(paste0(year, "/12/2")), "mins")})

names(DateList) <- 1999:2010

The result is a list with the dates in years 1999 - 2010:

> str(DateList)
List of 12
 $ 1999: POSIXct[1:1441], format: "1999-12-01 00:00:00" "1999-12-01 00:01:00" "1999-12-01 00:02:00" "1999-12-01 00:03:00" ...
 $ 2000: POSIXct[1:1441], format: "2000-12-01 00:00:00" "2000-12-01 00:01:00" "2000-12-01 00:02:00" "2000-12-01 00:03:00" ...
 $ 2001: POSIXct[1:1441], format: "2001-12-01 00:00:00" "2001-12-01 00:01:00" "2001-12-01 00:02:00" "2001-12-01 00:03:00" ...
 $ 2002: POSIXct[1:1441], format: "2002-12-01 00:00:00" "2002-12-01 00:01:00" "2002-12-01 00:02:00" "2002-12-01 00:03:00" ...
 $ 2003: POSIXct[1:1441], format: "2003-12-01 00:00:00" "2003-12-01 00:01:00" "2003-12-01 00:02:00" "2003-12-01 00:03:00" ...
 $ 2004: POSIXct[1:1441], format: "2004-12-01 00:00:00" "2004-12-01 00:01:00" "2004-12-01 00:02:00" "2004-12-01 00:03:00" ...
 $ 2005: POSIXct[1:1441], format: "2005-12-01 00:00:00" "2005-12-01 00:01:00" "2005-12-01 00:02:00" "2005-12-01 00:03:00" ...
 $ 2006: POSIXct[1:1441], format: "2006-12-01 00:00:00" "2006-12-01 00:01:00" "2006-12-01 00:02:00" "2006-12-01 00:03:00" ...
 $ 2007: POSIXct[1:1441], format: "2007-12-01 00:00:00" "2007-12-01 00:01:00" "2007-12-01 00:02:00" "2007-12-01 00:03:00" ...
 $ 2008: POSIXct[1:1441], format: "2008-12-01 00:00:00" "2008-12-01 00:01:00" "2008-12-01 00:02:00" "2008-12-01 00:03:00" ...
 $ 2009: POSIXct[1:1441], format: "2009-12-01 00:00:00" "2009-12-01 00:01:00" "2009-12-01 00:02:00" "2009-12-01 00:03:00" ...
 $ 2010: POSIXct[1:1441], format: "2010-12-01 00:00:00" "2010-12-01 00:01:00" "2010-12-01 00:02:00" "2010-12-01 00:03:00" ...

To access the dates in 2009, for example, you can now use:

DateList[["2009"]]