Is there a way to run Rcrawler without downloading all the HTMLs?

191 Views Asked by At

I'm running Rcrawler on a very large website, so it takes a very long time (3+ days with default page depth). Is there a way to not download all the HTMLs to make the process faster?

I only need the URLs that are stored in the INDEX. Or can anyone recommend another way to make Rcrawler run faster?

I have tried running it with a smaller page depth (5), but it is still taking forever.

1

There are 1 best solutions below

1
Janush On

I am dealing with the same issue. Depending on the source, in some cases I am even running at depth 1.

Best, Janusz