Here's the context of the problem I'm facing:
I have 202 URLs stored in a vector and I'm trying to scrape information from them using a for loop.
The URLs are basically every product that shows up within this website: https://lista.mercadolivre.com.br/_CustId_38356530
I obtained them using this code:
get_products <- function(n_page) {
cat("Scraping index", n_page, "\n")
page <- str_c(
"https://lista.mercadolivre.com.br/_Desde_",
n_page,
"_CustId_38356530_NoIndex_True"
) %>%
read_html()
tibble(url = page %>%
html_elements('a.ui-search-link') %>%
html_attr('href') %>%
str_subset('tracking_id') %>%
unique()
)}
products_url <- map_dfr(seq(1, 49 * 4, by = 48), get_products)
Problem is: I keep getting an error:
error in open.connection(x, "rb") : HTTP error 404
I have read a few articles and Q&A sessions discussing this problem, but I can't seem to find a solution that works for my case.
For example, someone suggests that the error happens when the page doesn't exist:
rvest Error in open.connection(x, "rb") : HTTP error 404
However that's not the case - when I go the URLs that faced this problem, they work just fine.
Plus, if that were the case, when I ran the code again, I should get the error for the same values inside the vector. But they seem to be happening randomly.
For example:
The first time I ran the code, I got the error on vector[6].
The second time I ran the same snippet, scraping vector [6] worked just fine.
It was also suggested that I should use try () or tryCatch() to avoid the error from stopping the for loop.
And for that purpose, try() worked.
However it would be preferable if I could avoid getting the error - because if I don't, I'll have to run the same snippet of code a few times in order to scrape every value I need.
Can anyone help me, please?
Why is it happening and what can I do to prevent it?
Here's the code I'm running, if it helps:
for (i in 1:length(standard_ad)) {
try(
collectedtitles <- collect(standard_ad[i],'.ui-pdp-title'))
assign('standard_titles', append(standard_titles, collectedtitles))
}
'Collect' being a function I created:
collect <- function(webpage,section) {
page <- read_html(webpage)
value <- html_node(page, section)
value <- html_text(value)
}

From the link you provided, I scrape the five available pages for that query without getting any error. Would you care to better explain how you got your error?
Scraping the amount sold from individual product sites with
politepackage. Polite was designed to be scraping friendly towards sites, therefore it will be slower thanrvestbut more reliable in certain scenarios. I have scraped 20 pages successfully without any issues. Run the previous code and then this one: