I am using Nutch 1.10 to crawl websites for my organization. I use a system with 16Gb RAM to do this crawl. As of now, my nutch file uses only 3-4Gb of RAM while crawling the data and it takes almmost 10 hours to finish it. Is there some way where i can configure the nutch to use more than 12Gb of RAM to finish the same task ? All Suggestions are most welcome !
1
There are 1 best solutions below
Related Questions in NUTCH
- How can I integrate Solr5.1.0 with Nutch1.10
- Trigger Apache Nutch Crawl Programmatically
- Nutch 2.3 REST curl syntax
- Nutch 2.3 + Elasticsearch / results not visualizing in Kibana
- inject runtime exception nutch 2.3
- Internal Server error while adding documents Solr
- Integrate Solr-5.2.1 with crawled data from Nutch?
- Nutch 2.x run every URL every time
- Nutch REST api Results (limited)
- Nutch: How to re-try transient errors (and none of the other URLs)?
- Apache Nutch REST api
- Integration of Apache Nutch 1.12 and Solr 5.4.1 failed
- what does SetProperty of solr.home do in Solr?
- Parsing open graph tags with nutch (into ElasticSearch)
- Nutch 2.3 - javax.net.ssl.SSLException
Related Questions in NUTCH2
- org.apache.tika.utils.XMLReaderUtils acquireSAXParser WARNING: Contention waiting for a SAXParser. Consider increasing the XMLReaderUtils.POOL_SIZE
- Nutch http.redirect.max may I know what does it Mean
- Find number of already exist documents in solr with solrindexing job in nutch
- Restrict Nutch to Seed path and its following webpages only
- Nutch 1.17 web crawling with storage optimization
- nutch fetch failed with protocol status: exception(16), lastModified=0: Http code=403, url=https://www.nicobuyscars.com
- Apache Nutch not reading a new configuration file when run with job file
- Apache Nutch title parsing issue for Language specific websites
- Apache Nutch section pages handling trick
- Apache Nutch SolrIndexer error in SolrCloud mode
- nutch time schedule to visit a page again
- Apache Nutch 2.3.1 opic scoring filter not working
- Apache Nutch not crawling all websites in in-links
- How can I connect apache Nutch 2.x to a remote HBase cluster?
- Apache Nutch ranking algorithm for specific language content
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Under the assumption that the script bin/nutch or bin/crawl is used for crawling in local mode (no Hadoop cluster): the environment variable
NUTCH_HEAPSIZEdefines the heap size in MB.