I am using Nutch 1.10 to crawl websites for my organization. I use a system with 16Gb RAM to do this crawl. As of now, my nutch file uses only 3-4Gb of RAM while crawling the data and it takes almmost 10 hours to finish it. Is there some way where i can configure the nutch to use more than 12Gb of RAM to finish the same task ? All Suggestions are most welcome !
1
There are 1 best solutions below
Related Questions in NUTCH
- Apache Nutch - How to store crawl data under the folder with the page name/url
- Nutch 1.19 / Solr 9.4.0 How to point Nutch to the Solr instance?
- nutch error: Illegal to have multiple roots (start tag in epilog?)
- What is the correct format for a solrcloud url in Nutch's index-writers.xml config?
- How can I fix the Bad Gateway error when adding Solr as a data source to Grafana?
- Apache Nutch 1.19 Getting Error: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
- Running apache nutch in local machine
- Nutch 1.19 Webgraph command error: OutlinkDb job did not succeed, job id: job_local306968781_0001, job status: FAILED, reason: NA
- Nutch 2.x response content : doesn't work properly without JavaScript enabled. Please enable it to continue
- Using Java & Apache Nutch to scrape dynamic elements from a website
- Building Apache Nutch Docker container
- Nutch additional fields for indexing in solr
- after fresh installation of nutch and solr crawl error
- Updating Max Depth for Apache-Nutch Crawler in scoring-depth filter is not working
- Search for solve a error 255 in SOLR Nutch
Related Questions in NUTCH2
- Updating Max Depth for Apache-Nutch Crawler in scoring-depth filter is not working
- Apache Nutch is crawling few domain more and other less with default configuration
- Apache Nutch not reading a new configuration file when run with job file
- I had some questions on db_redir_temp
- Nutch http.redirect.max may I know what does it Mean
- org.apache.tika.utils.XMLReaderUtils acquireSAXParser WARNING: Contention waiting for a SAXParser. Consider increasing the XMLReaderUtils.POOL_SIZE
- nutch fetch failed with protocol status: exception(16), lastModified=0: Http code=403, url=https://www.nicobuyscars.com
- Nutch 1.17 web crawling with storage optimization
- Restrict Nutch to Seed path and its following webpages only
- Nutch - Visit few pages again and again to find new links
- Apache Nutch index only article pages to Solr
- Errors using curl for nutch RESTapi calls
- Apache Nutch skipping URLs & truncating
- Apache Nutch 2.3.1, increase reducer memory
- Configuring RAM in Nutch
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Under the assumption that the script bin/nutch or bin/crawl is used for crawling in local mode (no Hadoop cluster): the environment variable
NUTCH_HEAPSIZEdefines the heap size in MB.