raise DropItem below is creating too much noise and outputting complete objects
Question: How can we make it output just the string? Or is there another Way to drop items in pipelines?
the result is now a whole object with all its values and cluttering the output. The wish would be to drop 1 item silently ... we used delete() before but this resulted in errors in later pipelines. Help appreciated
# Duplicate checker based on https://scrapy2.readthedocs.io/en/latest/topics/item-pipeline.html
if item['sku'] in self.skus_seen:
if "url" not in item or not item['url']:
item['url'] = '???, plz store item url in spider'
raise DropItem(f"Duplicate products {item['sku']} at {item['url']}")
A populair question and answer ;)
It is given here
Implement
import logging from scrapy import logformatter
class PoliteLogFormatter(logformatter.LogFormatter): def dropped(self, item, exception, response, spider): return { 'level': logging.INFO, 'msg': logformatter.DROPPEDMSG, 'args': { 'exception': exception, 'item': item, } }
Scrapy - Silently drop an item