Here's what I'm trying to do:
- Get one html page from url which contains multiple links inside
- Visit each link
- Extract some data from visited link and create object using it
So far All i did is just simple and slow way:
public List<Link> searchLinks(string name)
{
List<Link> foundLinks = new List<Link>();
// getHtmlDocument() just returns HtmlDocument using input url.
HtmlDocument doc = getHtmlDocument(AU_SEARCH_URL + fixSpaces(name));
var link_list = doc.DocumentNode.SelectNodes(@"/html/body/div[@id='parent-container']/div[@id='main-content']/ol[@id='searchresult']/li/h2/a");
foreach (var link in link_list)
{
// TODO Threads
// getObject() creates object using data gathered
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
}
return foundLinks;
}
To make it faster/efficient I need to implement threads, but I'm not sure how i should approach it, because I can't just randomly start threads, I need to wait for them to finish, thread.Join() kind of solves 'wait for threads to finish' problem, but it becomes not fast anymore i think, because threads will be launched after earlier one is finished.
The simplest way to offload the work to multiple threads would be to use
Parallel.ForEach()
in place of your current loop. Something like this:I'm not sure if there are other threading concerns in your overall code. (Note, for example, that this would no longer guarantee that the data would be added to
foundLinks
in the same order.) But as long as there's nothing explicitly preventing concurrent work from taking place then this would take advantage of threading over multiple CPU cores to process the work.