how to optimize Scrapyd setting for 200+ spider

1.2k Views Asked by Michael Nguyen At 27 July 2025 at 18:34

My scrapyd is handling 200 spiders at once daily . Yesterday, the server crashed because RAM hit its cap.

I am using scrapyd default setting

[scrapyd]
http_port  = 6800
debug      = off
#max_proc  = 1
eggs_dir   = /var/lib/scrapyd/eggs
dbs_dir    = /var/lib/scrapyd/dbs
items_dir  = /var/lib/scrapyd/items
logs_dir   = /var/log/scrapyd

Here is code to schedule all spiders:

url = 'http://localhost:6800/schedule.json'
crawler = self.crawler_process.create_crawler()
crawler.spiders.list()
for s in crawler.spiders.list():
    values = {'project' : 'myproject', 'spider' : s}
    data = urllib.urlencode(values)
    req = urllib2.Request(url, data)
    response = urllib2.urlopen(req)

how to optimize scrapyd setting to handle 200+ spiders ?

Thanks

Original Q&A

There are 1 best solutions below

Guy Gavriely On 16 December 2013 at 19:23

I'd first try to run scrapy crawl with --profile option on those spiders and examine the result to see what takes most of the memory, in general scrapy should just pipe and store data and should not accumulate data in memory.

otherwise, in its default scrapyd will run 4 processes, it can be adjusted by using the following settings parameters

max_proc The maximum number of concurrent Scrapy process that will be started. If unset or 0 it will use the number of cpus available in the system mulitplied by the value in max_proc_per_cpu option. Defaults to 0.

max_proc_per_cpu The maximum number of concurrent Scrapy process that will be started per cpu. Defaults to 4.

how to optimize Scrapyd setting for 200+ spider

There are 1 best solutions below

Related Questions in SCRAPY

Related Questions in SCRAPYD

Trending Questions

Popular # Hahtags

Popular Questions