Web scraping with R error

155 Views Asked by Jeisson At 08 September 2015 at 16:34

I trying to scrape sainsburys.co.uk, I'm running the next code in R

doc <- htmlTreeParse('http://www.sainsburys.co.uk/shop/gb/groceries/fruit-veg/all-fruit#langId=44&storeId=10151&catalogId=10122&categoryId=12545&parent_category_rn=12518&top_category=12518&pageSize=30&orderBy=FAVOURITES_FIRST&searchTerm')

rootNode <- xmlRoot(doc)

but I have this error:

Error in x$children[[1]] : subscript out of bounds

What am I doing wrong?

Original Q&A

There are 1 best solutions below

jlhoward On 08 September 2015 at 17:08 BEST ANSWER

You could try the httr library:

library(XML)
library(httr)
url <- 'http://www.sainsburys.co.uk/shop/gb/groceries/fruit-veg/all-fruit#langId=44&storeId=10151&catalogId=10122&categoryId=12545&parent_category_rn=12518&top_category=12518&pageSize=30&orderBy=FAVOURITES_FIRST&searchTerm'
doc <- content(GET(url),type="text/html")
xmlValue(doc["//title"][[1]])
# [1] "All fruit | Sainsbury's"

Web scraping with R error

There are 1 best solutions below

Related Questions in R

Related Questions in WEB-SCRAPING

Related Questions in XMLROOT

Trending Questions

Popular # Hahtags

Popular Questions