Filter GetFiles() on multiple criteria

1.3k Views Asked by At

Wasn't sure how to word my question title. but hopefully this will be quick.

I have a bunch of PDF files within a folder, I want to get the recent modified or created date of a certain type of file. My code works but it gives me results of all the PDF files. I only want results of certain names.

For example. within my "c:\temp\" I have some PDF files that starts with just about the same 3 first letters followed by numbers or more letters.

File names such as tes8796, fes8897895, bas232. etc etc. I only want to be shown results of "fes" and "bas" files, I do not want to see "tes" files...

Any ideas?

I believe it should look something like this {"fes*.pdf", "Bas*.pdf"})

My code that works, (I only want to get result of only "fes" and "bas" fils.)

Dim pathx As String = "C:\temp\"
Dim directory = New DirectoryInfo(pathx)
Dim from_date As DateTime = DateTime.Now.AddHours(-24)
Dim to_date As DateTime = DateTime.Now
Dim files = directory.GetFiles().Where(Function(file) file.LastWriteTime >= from_date AndAlso file.LastWriteTime <= to_date)
For Each filx In files
    ListBox1.Items.Add(filx)
Next
3

There are 3 best solutions below

0
On BEST ANSWER

There are multiple ways to do this. First, you can apply the PDF filter at the start: directory.GetFiles("*.pdf"), then just extend your function conditions:

Dim names() As String = {"fes", "bas"}

Dim files = di.GetFiles("*.pdf").Where(
        Function(file) (file.LastWriteTime >= from_date _ 
               AndAlso file.LastWriteTime <= DateTime.Now) _
                  AndAlso (names.Contains(file.Name.ToLowerInvariant.Substring(0, 3)))
            )

For Each filx In files
    Console.WriteLine(filx)
Next

The problem is that it depends on the file starting names to always and forever be 3 chars. Also, look at your code: once you get your array, you are looping thru it to post it to the ListBox. Usually the cool kids use LINQ to avoid loops. So, to fix both issues:

ListBox1.Items.AddRange(di.GetFiles("*.pdf").Where(
        Function(f) (f.LastWriteTime >= from_date AndAlso f.LastWriteTime <= DateTime.Now) _
            AndAlso names.Any(Function(n) f.Name.ToLowerInvariant.StartsWith(n))
            ).ToArray)

I got rid of var names which already exist in NET as Types such as Directory and File. The second function filters on the names array, then the results are pumped directly into the listbox items collection.

In almost all cases, a plain loop will be faster:

Dim myfiles = directory.GetFiles("*.pdf")
For Each f In myfiles
    If f.LastAccessTime >= from_date AndAlso f.LastWriteTime <= DateTime.Now Then
        For Each s As String In names
            If f.Name.ToLowerInvariant.StartsWith(s.ToLowerInvariant) Then
                Console.WriteLine(f)
                Exit For                     ' abort when found
            End If
        Next
    End If
Next
  • probably more understandable for a self proclaimed newbie in VB.NET
  • this can be incrementally debugged
  • it is 2 to 100 times faster to process (depending the number of files and name filters)a
  • will work with varying file name filters
  • combines the finding/filtering operation with posting to eliminate the extra loop (Console.WriteLine could add to Items or add to a List as needed.)

a Whether the time difference matters depends on how it is used and the load. If it runs continuously to process many, many files it might matter. As part of a desktop app, probably not.

0
On

You were very close. I would do something like this:

    Dim pathx As String = "C:\temp\"
    Dim directory = New IO.DirectoryInfo(pathx)
    Dim from_date As DateTime = DateTime.Now.AddHours(-24)
    Dim to_date As DateTime = DateTime.Now
    Dim files = directory.GetFiles().Where(Function(file) file.LastWriteTime >= from_date AndAlso file.LastWriteTime <= to_date AndAlso file.Name.EndsWith(".pdf") AndAlso (file.Name.StartsWith("fes") OrElse file.Name.StartsWith("bas")))
    For Each filx In files
        ListBox1.Items.Add(filx)
    Next
1
On

The GetFiles method that you're calling is overloaded and allows you to specify a single pattern to match. You can either call that method twice (once for each pattern) and then combine the results or else you can retrieve all files with a single call as you are and then add another condition to your Where call to filter ny name as well as date/time.