I am developing a web application that will read Excel files and make some validation.
I use ExcelDataReader
library, which lacks async methods.
Read()
method takes care of reading the next rows in a sheet, so it is mostly I/O bound (it just advances the stream to a certain position, so data from the row is available).
To increase my throughput, I wrapped it in Task.Run
, like:
await Task.Run(() => excelDataReader.Read());
From benchmarks I can see it executes faster (or at least no longer) than regular synchronous call excelDataReader.Read()
From posts I read I see Task.Run
is recommended only in UI scenarios, when we need to unblock the UI thread.
So, the question is does it make sense to wrap I/O bound operations in Task.Run
?
Or am I missing something?
EDIT:
benchmark results
So the second question would be - if my approach is just wrong, why it performs better?
ANOTHER EDIT:
Below is benchmark code that showed that asynchronous version is much faster:
using BenchmarkDotNet.Attributes;
using ExcelDataReader;
using System.Text;
namespace ExcelDataReaderTests;
public class ExcelDataReaderTester
{
private const string DirectoryWithManyBigExcelFiles = @"Directory With Many Big Excel Files";
[Benchmark]
public async Task TestAsync()
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var directory = Directory.GetFiles(DirectoryWithManyBigExcelFiles);
var tasks = new List<Task>();
foreach (var file in directory)
{
tasks.Add(UseExcelDataReadAsync(file));
}
await Task.WhenAll(tasks);
}
[Benchmark]
public void Test()
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var directory = Directory.GetFiles(DirectoryWithManyBigExcelFiles);
foreach (var file in directory)
{
UseExcelDataRead(file);
}
}
private static void UseExcelDataRead(string filePath)
{
var fileStream = File.OpenRead(filePath);
var excelDataReader = ExcelReaderFactory.CreateReader(fileStream);
while (excelDataReader.Read())
{
var x = excelDataReader.GetFieldType(0);
}
}
private static async Task UseExcelDataReadAsync(string filePath)
{
var fileStream = File.OpenRead(filePath);
var excelDataReader = ExcelReaderFactory.CreateReader(fileStream);
while (await Task.Run(() => excelDataReader.Read()))
{
var x = excelDataReader.GetFieldType(0);
}
}
}
And the benchmark are run with:
BenchmarkRunner.Run(typeof(ExcelDataReaderTester).Assembly);
I am using BechmarkDotNet for benchmarking.
ExcelDataReader loads all data in memory so it makes no sense to provide asynchronous methods. The
xlsx
format is a ZIP package containing XML files which can't be read line by line. There's no IO involved inExcelDataReader.Read
.The real IO is performed by
CreateReader
. Unfortunately there's noCreateReaderAsync
and no plan to add this yetIn general, pushing an IO operation to another thread won't make it faster.
In a desktop application, if you want to process the contents of an Excel file without blocking the UI thread, put the entire processing code into a method and call it with
Task.Run
. For more complex processing you can use eg DataFlow or Channels to execute multiple processing steps in the background.In a web application though, each request is server by a different ThreadPool thread. There's no UI thread to block.
await Task.Run(...)
simply moves execution from one ThreadPool thread to another.