Python pipelines duplicate checker, using "raise DropItem" but how do we make it pipe down?

66 Views Asked by At

raise DropItem below is creating too much noise and outputting complete objects

Question: How can we make it output just the string? Or is there another Way to drop items in pipelines?

the result is now a whole object with all its values and cluttering the output. The wish would be to drop 1 item silently ... we used delete() before but this resulted in errors in later pipelines. Help appreciated

    # Duplicate checker based on https://scrapy2.readthedocs.io/en/latest/topics/item-pipeline.html
    if item['sku'] in self.skus_seen:
        if "url" not in item or not item['url']:
            item['url'] = '???, plz store item url in spider'
        raise DropItem(f"Duplicate products {item['sku']} at {item['url']}")
1

There are 1 best solutions below

0
snh_nl On

A populair question and answer ;)

It is given here

Implement

import logging from scrapy import logformatter

class PoliteLogFormatter(logformatter.LogFormatter): def dropped(self, item, exception, response, spider): return { 'level': logging.INFO, 'msg': logformatter.DROPPEDMSG, 'args': { 'exception': exception, 'item': item, } }

Scrapy - Silently drop an item