after fresh installation of nutch and solr crawl error

77 Views Asked by ayhan At 06 January 2023 at 11:30

I have a problem after fresh installation of nutch 1.19 and solr 8.11.2. After running the crawl process, crawling finishes with an NullPointerException and the following Error message:

Error running: /opt/solr/apache-nutch-1.19/bin/nutch fetch -Dsolr.server.url=http//localhost:8983/solr/nutch -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 crawl/segments/20230106121647 -threads 50 Failed with exit value 255.

Has anybody an idea what causes this error?

Original Q&A

There are 1 best solutions below

Sebastian Nagel On 06 January 2023 at 13:50

The error message indicates that the memory (Java heap) is not sufficient to spin up 50 fetcher threads. You could try the following:

if you do not need the default number of 50 fetcher threads, reduce it by passing the option --num-threads n_threads to bin/crawl
the Java heap size can be set via the environment variable NUTCH_HEAPSIZE - the default is 4 MB which should be sufficient even with 50 threads unless you have very large documents (eg. PDF files) to parse and index.
there might be limits on your system which require to use less memory or threads

after fresh installation of nutch and solr crawl error

There are 1 best solutions below

Related Questions in SOLR

Related Questions in NUTCH

Trending Questions

Popular # Hahtags

Popular Questions