Is there a way to run Rcrawler without downloading all the HTMLs?

191 Views Asked by Yannick At 27 May 2019 at 13:09

I'm running Rcrawler on a very large website, so it takes a very long time (3+ days with default page depth). Is there a way to not download all the HTMLs to make the process faster?

I only need the URLs that are stored in the INDEX. Or can anyone recommend another way to make Rcrawler run faster?

I have tried running it with a smaller page depth (5), but it is still taking forever.

Original Q&A

There are 1 best solutions below

Janush On 03 June 2019 at 10:03

I am dealing with the same issue. Depending on the source, in some cases I am even running at depth 1.

Best, Janusz

Is there a way to run Rcrawler without downloading all the HTMLs?

There are 1 best solutions below

Related Questions in R

Related Questions in WEB-CRAWLER

Related Questions in RCRAWLER

Trending Questions

Popular # Hahtags

Popular Questions