I am using vb.net and have a handful of URLs that refuse to be crawled. I would really like to detect when a crawl returns a null response, but seem to be having a problem figuring out HOW.
Code:
Public Sub crawler_ProcessPageCrawlCompleted(sender As Object, e As PageCrawlCompletedArgs)
pageNumber += 1
Try
Dim crawledPage As CrawledPage = e.CrawledPage
If (Not (crawledPage.HttpWebResponse Is Nothing) And Not (crawledPage.WebException Is Nothing)) Or crawledPage.HttpWebResponse.StatusCode <> HttpStatusCode.OK Then
CrawlFailed(e.CrawledPage.ToString, Failed)
Else
If String.IsNullOrEmpty(crawledPage.Content.Text) Then
CrawlFailed(e.CrawledPage.ToString, NoContent)
Else
StoreContent(e)
End If
End If
Catch ex As Exception
RichTextBox1.AppendText(e.CrawledPage.ToString & " - " & ex.Message & vbCrLf)
End Try
End Sub
I put in the Catch-Try to capture that exception, but I would really rather capture it in my CrawlFailed subroutine to do something with that URL.
I have tried to figure out how to use GetResponseStream and Stream.Null, but can't seem to figure out how to detect an empty stream :( I'm just missing something, but I've googled all over the place and the best I can find is this thread: crawledPage.HttpWebResponse is null in Abot.
However - that doesn't really explain HOW to detect and code against the result.
I had the same issue (dotnet core), with a fiddler session I could see the response actually did come. But I also saw it took a long time for the site to return result.
Try setting config.HttpRequestTimeoutInSeconds to a higher value. It resolved my issues.