I'm building a small scraper that navigates through a set of URL.
Currently I've something like:
public class MyScraper: WebScraper{
private Queue<String> _urlToParse = new Queue<String>();
public override void Init(){
//Initializing _urlToParse with more than 1000 URLs
Request(_urlToParse.Dequeue(), Parse);
}
public override void Parse(Response response){
if(response.WasSuccessful){
//...Parsing
}else{
//logging error
}
Request(_urlToParse.Dequeue(), Parse);
}
}
But the Parse Method isn't called when I receive a 404 error.
Consequence:
- I cannot log the error(and when Going out the first
Requestcall, I've no way to know if it has been successfull - The next URL is not parsed
I was thinking that I would go to the Parse method with response.WasSuccessful = false and then be able to check the status code.
How should I do to handle this 404?
The only way I could find to log the failed Url is to override the
Log(string Message, LogLevel Type)method. There doesn't appear to be a good reason to haveresponse.WasSuccessful. As you said it only appears to callParse()when it is succesful.Another option is it appears that
WebScraperhas aMaxHttpConnectionLimitthat you could use to make sure it was only opening one connection at a time.