As the following code shows, html
in rvest
package uses htmlParse
from XML
package. .
html
function (x, ..., encoding = NULL)
{
parse(x, XML::htmlParse, ..., encoding = encoding)
}
<environment: namespace:rvest>
htmlParse
function (file, ignoreBlanks = TRUE, handlers = NULL, replaceEntities = FALSE,
asText = FALSE, trim = TRUE, validate = FALSE, getDTD = TRUE,
isURL = FALSE, asTree = FALSE, addAttributeNamespaces = FALSE,
useInternalNodes = TRUE, isSchema = FALSE, fullNamespaceInfo = FALSE,
encoding = character(), useDotNames = length(grep("^\\.",
names(handlers))) > 0, xinclude = TRUE, addFinalizer = TRUE,
error = htmlErrorHandler, isHTML = TRUE, options = integer(),
parentFirst = FALSE)
.....
So, for the following url:
myurl<-"http://www.nepalstock.com.np/"
parse_XML<-htmlParse(myurl) #runs without error
parse_rvest<-html(myurl) # throws out the Internal Sever error
Error in parse.response(r, parser, encoding = encoding) :
server error: (500) Internal Server Error
Any idea?
Reset the default user-agent from the underlying
httr::GET
request, then it works:or
Note that for debugging purposes, you can add
verbose()
tohtml(...)
.Add:
Using the new
rvest
/xml2
/curl
combo, it should look like: