R - Select files by dates in filenames

I already had a similar question here: R - How to choose files by dates in file names?

But I have to do a little change.

I still have a list of filenames, similar to that:

list = c("AT0ACH10000700100dymax.1-1-1993.31-12-2003",

I already have a command to sort out files that a certain length of recording (for example 10 in this case):

#Listing Files (creates the list above)
files = list.files(pattern="*00007.*dymax", recursive = TRUE)

#Making date readable
split_daymax = strsplit(files, split=".", fixed=TRUE)

from = unlist(lapply(split_daymax, "[[", 2))
to = unlist(lapply(split_daymax, "[[", 3))
from = as.POSIXct(from, format="%d-%m-%Y")
to = as.POSIXct(to, format="%d-%m-%Y")

timelistmax = difftime(to, from, "days")

#Files with more than 10 years of recording
index = timelistmax >= 10*360
filesdaymean = filesdaymean[index]

My problem is now that I have way too many files and no computer can handle that.

Now I only want to read in files that contain files from 1993 (or any other certain year I want) on and have 10 years of recording from then on, so the recordings should be at least until 2003.

So the file 1973-1994 should not be included, but the file from 1981- 2011 is fine.

I dont know how to select a year in this case.

I am thankful for any help


fileDates <- str_extract_all(files, "[0-9]{1,2}-[0-9]{1,2}-[0-9]{4}")

find_file <- function(x, whichYear, noYears = 10) {
  start <- as.Date(x[[1]], "%d-%m-%Y")
  end <- as.Date(x[[2]], "%d-%m-%Y")
  years <- as.numeric(end-whichYear, units = "days")/365
  years > noYears & (year(start) <= year(whichYear) & 
                       year(end) >= year(whichYear))
sapply(fileDates, find_file, whichYear = as.Date("1993-01-01"), noYears = 10)

You have two conditions which you can calculate first the number of years since 1993 and then use boolean logic to figure out if 1993 is within the date range.


Using files, to, and from as you've defined them above, this should get get you files that contain atleast a ten year span of data between 1993 and 2003:

df <- data.frame(file_name = files, file_start = from, file_end = to)
df_index <- year(df$file_start) <=1993 & year(df$file_end) >= 2003
files_to_load <- df$file_name[df_index]

If a base only solution is desired, turn the POSIXct to POSIXlt and extract the year component as such:

df <- data.frame(file_name = files, 
                 file_start = as.POSIXlt(from), 
                 file_end = as.POSIXlt(to))

df_index <- (df$file_start$year+1900 <=1993 & 
             df$file_end$year+1900  >= 2003)

files_to_load <- df$file_name[df_index]