C# WPF Speedup (Thread) Total FileInfo.Length from Multiple Files

477 Views Asked by At

I'm trying to Speedup the Sum-calculation of all Files in all Folders recursive given by one Path.

Let's say i choose "E:\" as Folder. I will now get the entrie recursive Fileslist via "SafeFileEnumerator" into IEnumerable in Milliseconds (works like a charm)

Now i would like to gather the sum of all bytes from all files in this Enumerable. Right now i loop them via foreach and get the FileInfo(oFileInfo.FullName).Length; - for each file.

This is working, but it is slow - it takes about 30 seconds. If i lookup the space consumption via Windows rightclick - properties of all selected folders in the windows explorer i get them in about 6 seconds (~ 1600 files in 26 gigabytes of data on ssd)

so my first thougth was to speedup gathering by the usage of threads, but i don't get any speedup here..

the code without the threads is below:

public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
    long FolderSize = 0;

    IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
    foreach (FileSystemInfo oFileInfo in aFiles)
    {
        // check if we will cancel now
        if (oCancelToken.Token.IsCancellationRequested)
        {
            throw new OperationCanceledException();
        }

        try
        {
            FolderSize += new FileInfo(oFileInfo.FullName).Length;
        }
        catch (Exception oException)
        {
            Debug.WriteLine(oException.Message);
        }
    }

    return FolderSize;
}

the multithreading code is below:

public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
    long FolderSize = 0;

    int iCountTasks = 0;

    IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
    foreach (FileSystemInfo oFileInfo in aFiles)
    {
        // check if we will cancel now
        if (oCancelToken.Token.IsCancellationRequested)
        {
            throw new OperationCanceledException();
        }

        if (iCountTasks < 10)
        {
            iCountTasks++;
            Thread oThread = new Thread(delegate()
            {
                try
                {                            
                    FolderSize += new FileInfo(oFileInfo.FullName).Length;
                }
                catch (Exception oException)
                {
                    Debug.WriteLine(oException.Message);
                }

                iCountTasks--;
            });
            oThread.Start();
            continue;
        }

        try
        {
            FolderSize += new FileInfo(oFileInfo.FullName).Length;
        }
        catch (Exception oException)
        {
            Debug.WriteLine(oException.Message);
        }
    }

    return FolderSize;
}

could someone please give me an advice how i could speedup the foldersize calculation process?

kindly regards

Edit 1 (Parallel.Foreach suggestion - see comments)

public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
    long FolderSize = 0;

    ParallelOptions oParallelOptions = new ParallelOptions();
    oParallelOptions.CancellationToken = oCancelToken.Token;
    oParallelOptions.MaxDegreeOfParallelism = System.Environment.ProcessorCount;

    IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray();

    Parallel.ForEach(aFiles, oParallelOptions, oFileInfo =>
    {
        try
        {
            FolderSize += new FileInfo(oFileInfo.FullName).Length;
        }
        catch (Exception oException)
        {
            Debug.WriteLine(oException.Message);
        }
    });

    return FolderSize;
}
2

There are 2 best solutions below

1
On

Side-note about SafeFileEnumerator performance:

Once you get IEnumerable, it doesn't mean you got entire collection because it is lazy proxy. Try this snippet below - I'm sure you'll see the performance difference (sorry if it's not compiling - just to illustrate the idea):

var tmp = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray(); // fetch all records explicitly to populate the array
IEnumerable<FileSystemInfo> aFiles = tmp;

Now out the actual result you want to achieve.

  1. If you need just file sizes - it's better to request OS functions about filesystem, not querying files one-by-one. I'd start with DirectoryInfo class (see for instance http://www.tutorialspoint.com/csharp/csharp_windows_file_system.htm).
  2. If you need to calculate the checksum for each, it would be definitely slow task because you have to load each of the files first (a lot of memory transfers). Threads are not a booster here because they'll be limited by OS filesystem throughput, not your CPU power.
4
On
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.IO;

namespace ConsoleApplication3
{
    class Program
    {
        static void Main(string[] args)
        {
            long size = fetchFolderSize(@"C:\Test", new CancellationTokenSource());

        }

            public static long fetchFolderSize(string Folder, CancellationTokenSource  oCancelToken)
    {


            ParallelOptions po = new ParallelOptions();
            po.CancellationToken = oCancelToken.Token;
            po.MaxDegreeOfParallelism = System.Environment.ProcessorCount;

            long folderSize = 0;
            string[] files = Directory.GetFiles(Folder);

            Parallel.ForEach<string,long>(files,
                                            po,
                                            () => 0,
                                            (fileName, loop, fileSize) => 
                                            {
                                                fileSize = new FileInfo(fileName).Length;
                                                po.CancellationToken.ThrowIfCancellationRequested();
                                                return fileSize;

                                            },  
                                            (finalResult) => Interlocked.Add(ref folderSize, finalResult)
                                            );


            string[] subdirEntries = Directory.GetDirectories(Folder);

            Parallel.For<long>(0, subdirEntries.Length, () => 0, (i, loop, subtotal) =>
            {
                if ((File.GetAttributes(subdirEntries[i]) & FileAttributes.ReparsePoint) !=
                      FileAttributes.ReparsePoint)
                    {
                        subtotal += fetchFolderSize(subdirEntries[i], oCancelToken);
                        return subtotal;
                    }
                    return 0;
                },
                    (finalResult) => Interlocked.Add(ref folderSize, finalResult)
                );

            return folderSize ;
    }
    }

}