I am trying to crawl url: https://www.randolphnj.org/
But it is showing this error
2020-09-22 15:03:08,395 INFO httpclient.Http: http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2020-09-22 15:03:08,395 INFO httpclient.Http: http.enable.cookie.header = true
2020-09-22 15:03:08,399 INFO conf.Configuration: found resource httpclient-auth.xml at file:/tmp/hadoop-unjar7802696204891280694/httpclient-auth.xml
Fetch failed with protocol status: exception(16), lastModified=0: Http code=406, url=https://www.randolphnj.org/
may I know what is the reason.kindly help me to solve.
Most likely the server is blocking requests when the HTTP request header "User-agent" includes the string "Nutch". I was able to reproduce the behavior using wget: