scrapy access to log count while running in scrapinghub

1k Views Asked by At

I have a small scrapy extension which looks into the stats object of a crawler and sends me an email if the crawler has thrown log messages of a certain type (e.g. WARNING, CRITICAL, ERROR).

These stats are accessible by the spiders stats object (crawler.stats.get_stats()), e.g.:

crawler.stats.get_stats().items()
 [..]
 'log_count/DEBUG': 9,
 'log_count/ERROR': 2,
 'log_count/INFO': 4,
 [..]

If I run the spider on scrapinghub, the log stats are not there. There are a lot of other thins (e.g. exception count, etc..) but the log count is missing. Does someone know how to get them there or how to access them on scraping hub?

I've also checked the "Dumping Scrapy stats" values after a spider closes. If I run it on my machine the log count is there, If I run it on scrapinghub the log count is missing.

2

There are 2 best solutions below

0
On BEST ANSWER

This might also help someone else. I wrote a small plugin to collect the log stats and save them in the stats dict with a own prefix.

to activate it, save it to a file (eg. loggerstats.py) and activate it as an extension in your crawlers settings.py:

EXTENSIONS = {
    'loggerstats.LoggerStats': 10,
}

the script:

from scrapy import log
from scrapy.log import level_names
from twisted.python import log as txlog


class LoggerStats(object):

    def __init__(self, crawler, prefix='stats_', level=log.INFO):
        self.level = level
        self.crawler = crawler
        self.prefix = prefix
        txlog.startLoggingWithObserver(self.emit, setStdout=False)

    @classmethod
    def from_crawler(cls, crawler):
        o = cls(crawler)
        return o

    def emit(self, ev):
        level = ev.get('logLevel')
        if level >= self.level:
            sname = '%slog_count/%s' % (self.prefix, level_names.get(level, level))
            self.crawler.stats.inc_value(sname)

It will then count the logs and maintain the count in the crawler stats. For example:

stats_log_count/INFO: 10
stats_log_count/WARNING: 1
stats_log_count/CRITICAL: 5
1
On

The problem here is that scrapy populate this stats in the log observer; but Scrapinghub isn't using the default log observer. Probably reporting this on their forum is the best, you can also link this question on it.