I have a problem after fresh installation of nutch 1.19 and solr 8.11.2. After running the crawl process, crawling finishes with an NullPointerException and the following Error message:
Error running: /opt/solr/apache-nutch-1.19/bin/nutch fetch -Dsolr.server.url=http//localhost:8983/solr/nutch -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 crawl/segments/20230106121647 -threads 50 Failed with exit value 255.
Has anybody an idea what causes this error?
The error message indicates that the memory (Java heap) is not sufficient to spin up 50 fetcher threads. You could try the following:
--num-threads n_threadsto bin/crawlNUTCH_HEAPSIZE- the default is 4 MB which should be sufficient even with 50 threads unless you have very large documents (eg. PDF files) to parse and index.