How to speed up IronPdf when using async/await

2.4k Views Asked by At

I'm trying to make a piece of code run faster. The code is already using async/await. But it's still slow.

So I tried to alter my foreach to use the new IAsyncEnumerable. However I gained 0 performance from this. And it appears to run the code sequentially. Which surprised me. I thought the await foreach would run each iteration in its own thread.

Here's my attempt at speeding up the code.

var bag = new ConcurrentBag<IronPdf.PdfDocument>(); // probably don't need a ConcurrentBag
var foos = _dbContext.Foos;
await foreach (var fooPdf in GetImagePdfs(foos))
{
    bag.Add(fooPdf);
}

private async IAsyncEnumerable<IronPdf.PdfDocument> GetImagePdfs(IEnumerable<Foo> foos)
{
    foreach (var foo in foos)
    {
        var imagePdf = await GetImagePdf(foo);

        yield return imagePdf;
    }
}

private async Task<IronPdf.PdfDocument> GetImagePdf(Foo foo)
{
    using var imageStream = await _httpService.DownloadAsync(foo.Id);
    var imagePdf = await _pdfService.ImageToPdfAsync(imageStream);

    return imagePdf;
}

using IronPdf;
public class PdfService
{
    // this method is quite slow
    public async Task<PdfDocument> ImageToPdfAsync(Stream imageStream)
    {
        var imageDataURL = Util.ImageToDataUri(Image.FromStream(imageStream));
        var html = $@"<img style=""max-width: 100%; max-height: 70%;"" src=""{imageDataURL}"">";
        using var renderer = new HtmlToPdf(new PdfPrintOptions()
        {
            PaperSize = PdfPrintOptions.PdfPaperSize.A4,
        });
        return await renderer.RenderHtmlAsPdfAsync(html);
    }
}

I also gave Parallel.ForEach a try

Parallel.ForEach(foos, async foo =>
{
    var imagePdf = await GetImagePdf(foo);
    bag.Add(imagePdf);
});

However I keep reading that I shouldn't use async with it, so not sure what to do. Also the IronPdf library crashes when doing it that way.

2

There are 2 best solutions below

0
On BEST ANSWER

The problem with your foreach and await foreach approaches is they are going to execute sequentially (even though they take advantage of the async and await pattern). Essentially, await does exactly that, awaits.

In regards to the Parallel.ForEach your suspicions are correct, it's not suitable for async methods an IO bound workloads. Parallel.ForEach takes an Action delegate and giving an async lambda to an Action actually just creates an async void with the consequence of each task running unobserved (which has several disadvantages).

There are many approaches to take from here, but the simplest is to start each task hot, project them to a collection, and await them all to completion. This way you are letting the IO bound workloads offload (term used loosely) to an IO Completion Port, thus allowing any potential thread to go back to the thread pool to get reused by the Task Scheduler efficiently until the IO work completes.

Assuming there are no shared resources, just project the started tasks to an IEnumerable<Task<PdfDocument>> and use Task.WhenAll

Creates a task that will complete when all of the supplied tasks have completed.

var tasks = _dbContext.Foos.Select(x => GetImagePdfs(x))
var results = await Task.WhenAll(tasks);

In the above scenario, when Select enumerates the async method GetImagePdfs each Task is started hot, the Task Scheduler takes care of scheduling any threads that are needed from the threadpool. As soon as any code awaits an IO job a callback is made with the operating system and the thread goes back to the pool to get reused, so on and so forth. Task.WhenAll waits for all the tasks to complete or fault then returns a collection of each result.

0
On

Moving to IronPdf 2021.9 or greater significantly improved multithreading support and removed deadlocks in my application.
This impacted Async performance of IronPDF "html to pdf" PDF rendering measurably for my application:

https://www.nuget.org/packages/IronPdf/

// PM> Install-Package IronPdf
using IronPdf;
 
var Renderer = new IronPdf.ChromePdfRenderer();
 
// All IronPdf Rendering methods have Async equivalents
var doc = await Renderer.RenderHtmlAsPdfAsync("<h1>Html with CSS and Images</h1>");

doc.SaveAs("example.pdf");

Code Examples:

This is also related to an existing ticket:

Asynchronous Code Is Not Faster than the Synchronous Version