Call a method after ThreadPool.QueueUserWorkItem finished

1.1k Views Asked by At

I'm working on a console application written in c#

The purpose of this app is to go through all drives and files and do something on them. But going through all files with a single thread is a time consuming process which is not my goal.

So I decided to use ThreadPool to handle it like so :

class Program () {
    static void Main(string[] args) {
        foreach (var d in DriveInfo.GetDrives()) {
            ThreadPool.QueueUserWorkItem(x => Search(d.RootDirectory.GetDirectories()));
        }

        Console.WriteLine("Job is done.");
        Console.ReadKey();
    }

    private static void Search(DirectoryInfo[] dirs) {
        foreach (var dir in dirs) {
            try {
                foreach (var f in dir.GetFiles()) {
                    ThreadPool.QueueUserWorkItem(x => DoTheJob(f));
                }

                ThreadPool.QueueUserWorkItem(x => Search(dir.GetDirectories()));
            } catch (Exception ex) {
                continue;
            }
        }
    }       
}

The problem is Console.WriteLine("Job is done.") executes before all threads get done. I've read some questions and answers but none of them addressed my problem.

How can I call a method after all threads in the ThreadPool finished their job?

Note: As you might know, I have no idea how many threads will be created because I don't know how many files are there. And setting timeout is not an option.

2

There are 2 best solutions below

0
On

Here is example of how you can use Parallel.ForEach to produce fair load:

static IEnumerable<FileSystemInfo> GetFileSystemObjects(DirectoryInfo dirInfo)
{
    foreach (var file in dirInfo.GetFiles())
        yield return file;

    foreach (var dir in dirInfo.GetDirectories())
    {
        foreach (var fso in GetFileSystemObjects(dir))
            yield return fso;
        yield return dir;
    }
}

static void Main(string[] args)
{
    var files = GetFileSystemObjects(new DirectoryInfo(<some path>)).OfType<FileInfo>();

    Parallel.ForEach(files, f =>
    {
        DoTheJob(f);
    });
}

If however DoTheJob contains I/O-bound operations I'd consider to handle it with await as Henk Holterman suggested as Parallel.ForEach is agnostic for I/O load.

17
On

Using QueueUserWorkItem() is the low level, barebones approach. With no control over your jobs, it's fire and forget.

Tasks run on top of the ThreadPool, and async/await can solve your problem here.

The toplevel:

var tasks = new List<Task>();
foreach (var d in DriveInfo.GetDrives())
{
    tasks.Add( Search(d.RootDirectory.GetDirectories()));
}
Task.WaitAll(tasks.ToArray());

and then you Search() becomes

private static async Task Search(DirectoryInfo[] dirs)
{
    ... 
    foreach(...)
    {
        await Task.Run(...);
    } 
    await Search(dir.GetDirectories());
}

That DoTheJob() thing should idealy use async I/O but otherwise you can await Task.Run( () => DoTheJob(f))