Scrapyd, Celery and Django running with Supervisor - GenericHTTPChannellProtocol Error

726 Views Asked by At

I'm using a project called Django Dynamic Scraper to build a basic web scraper on top of Django. Everything works find in development but when setting up on my Digital Ocean VPS I run into issues.

I'm using Supervisor to keep three things running:

  1. Scrapyd on 0.0.0.0:6800
  2. Celery task scheduler
  3. Celery worker

Whenever Celery passes a job to Scrapyd to scrape I get an error logged to the Scrapyd log:

2017-08-29T08:49:06+0000 [twisted.python.log#info] "127.0.0.1" - - [29/Aug/2017:08:49:05 +0000] "POST /schedule.json HTTP/1.1" 200 3464 "-" "-"
2017-08-29T08:49:07+0000 [_GenericHTTPChannelProtocol,5,127.0.0.1] Unhandled Error
    Traceback (most recent call last):
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/twisted/web/http.py", line 2059, in allContentReceived
        req.requestReceived(command, path, version)
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/twisted/web/http.py", line 869, in requestReceived
        self.process()
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/twisted/web/server.py", line 184, in process
        self.render(resrc)
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/twisted/web/server.py", line 235, in render
        body = resrc.render(self)
    --- <exception caught here> ---
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 21, in render
        return JsonResource.render(self, txrequest).encode('utf-8')
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapyd/utils.py", line 20, in render
        r = resource.Resource.render(self, txrequest)
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/twisted/web/resource.py", line 250, in render
        return m(request)
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 49, in render_POST
        spiders = get_spider_list(project, version=version)
      File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapyd/utils.py", line 137, in get_spider_list
        raise RuntimeError(msg.encode('unicode_escape') if six.PY2 else msg)
    exceptions.RuntimeError: Traceback (most recent call last):\n  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main\n    "__main__", fname, loader, pkg_name)\n  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code\n    exec code in run_globals\n  File "/home/dean/website/venv/lib/python2.7/site-packages/scrapyd/runner.py", line 40, in <module>\n    main()\n  File "/home/dean/website/venv/lib/python2.7/site-packages/scrapyd/runner.py", line 37, in main\n    execute()\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 148, in execute\n    cmd.crawler_process = CrawlerProcess(settings)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 243, in __init__\n    super(CrawlerProcess, self).__init__(settings)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 134, in __init__\n    self.spider_loader = _get_spider_loader(settings)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 330, in _get_spider_loader\n    return loader_cls.from_settings(settings.frozencopy())\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 61, in from_settings\n    return cls(settings)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 25, in __init__\n    self._load_all_spiders()\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders\n    for module in walk_modules(name):\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules\n    submod = import_module(fullpath)\n  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module\n    __import__(name)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/dynamic_scraper/spiders/checker_test.py", line 9, in <module>\n    from dynamic_scraper.spiders.django_base_spider import DjangoBaseSpider\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/dynamic_scraper/spiders/django_base_spider.py", line 13, in <module>\n    django.setup()\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/django/__init__.py", line 22, in setup\n    configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/django/conf/__init__.py", line 56, in __getattr__\n    self._setup(name)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/django/conf/__init__.py", line 41, in _setup\n    self._wrapped = Settings(settings_module)\n  File "/home/dean/website/venv/local/lib/python2.7/site-packages/django/conf/__init__.py", line 110, in __init__\n    mod = importlib.import_module(self.SETTINGS_MODULE)\n  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module\n    __import__(name)\nImportError: No module named IG_Tracker.settings\n

On the final line of the stacktrace it seems to be having trouble importing my Django project settings into the scrapy project settings. My scrapy project is located inside one of my Django apps as recommended by Django Dynamic Scraper.

Here is my Scrapy settings file where it tries to import Django Settings (and succeeds in development):

import os
import sys

sys.path.append('../../../IG_Tracker/')
os.environ['DJANGO_SETTINGS_MODULE'] = 'IG_Tracker.settings'

My Scrapyd Supervisor config:

[program:scrapyd]
directory=//home/dean/website/instagram/ig_scraper
command=/home/dean/website/venv/bin/scrapyd -n
environment=MY_SETTINGS=/home/dean/website/IG_Tracker/settings.py
user=dean
autostart=true
autorestart=true
redirect_stderr=true
numprocs=1
stdout_logfile=/home/dean/website/scrapyd.log
stderr_logfile=/home/dean/website/scrapyd.log
startsecs=10
0

There are 0 best solutions below