I'm trying to download approx. 45.000 image files from an API. The image files have less than 50kb each. With my code this will take 2-3 Hours.
Is there an more efficient way in C# to download them?
private static readonly string baseUrl =
"http://url.com/Handlers/Image.ashx?imageid={0}&type=image";
internal static void DownloadAllMissingPictures(List<ListObject> ImagesToDownload,
string imageFolderPath)
{
Parallel.ForEach(Partitioner.Create(0, ImagesToDownload.Count), range =>
{
for (var i = range.Item1; i < range.Item2; i++)
{
string ImageID = ImagesToDownload[i].ImageId;
using (var webClient = new WebClient())
{
string url = String.Format(baseUrl, ImageID);
string file = String.Format(@"{0}\{1}.jpg", imageFolderPath,
ImagesToDownload[i].ImageId);
byte[] data = webClient.DownloadData(url);
using (MemoryStream mem = new MemoryStream(data))
{
using (var image = Image.FromStream(mem))
{
image.Save(file, ImageFormat.Jpeg);
}
}
}
}
});
}
Likely not.
One thing to think about though - besides NOT using WebClient as it has been replaced by HttpClient long time ago, you just missed the memo. I suggest a fast run through the documentation.
Regardless what you think you do with Parallel.Foreach - you are limited by the parallel connection settings (ServicePointManager, HttpClientHandler).
You should read the manuals for those and experiment with higher limits, because right now that is quite likely limiting your parallelism to a quite low number and possibly can handle 3-4 times the limit.
Maximum concurrent requests for WebClient, HttpWebRequest, and HttpClient
has a deeper explanation.