How to not show extracted links and scraped items?

82 Views Asked by At

Newbie here, running scrapy in windows. How to avoid showing the extracted links and crawled items in the command window? I found comments in the "parse" section on this linkhttp://doc.scrapy.org/en/latest/topics/commands.html, not sure if it's relevant and how to apply it if so. Here is more detail with part of the code, starting from my second Ajax request (In the first Ajax request, the callback function is "first_json_response":

def first_json_response(self, response):
    try:
        data = json.loads(response.body)
            meta = {'results': data['results']}
            yield Request(url=url, callback=self.second_json_response,headers={'x-requested-with': 'XMLHttpRequest'}, meta = meta)

def second_json_response(self, response):
    meta = response.meta        
    try:
        data2 = json.loads(response.body)
    ...

The "second_json_response" is to retrieve the response from the requested result in first_json_response, as well as to load the new requested data. "meta" and "data" are then both used to define items that need to be crawled. Currently, the meta and links are shown in the windows terminal where I submitted my code. I guess it is taking up some extra time for computer to show them on the screen, and thus want them to disappear. I hope by running scrapy on a kinda-of batch mode will speed up my lengthy crawling process.

Thanks! I really appreciate your comment and suggestion!

2

There are 2 best solutions below

2
On

From scrapy documentation:

"You can set the log level using the –loglevel/-L command line option, or using the LOG_LEVEL setting."

So append to your scray crawl etc command -loglevel='ERROR' . That should make all the info disappear from your command line, but I don't think this will speed things much.

0
On

In your pipelines.py file, try using something like:

import json

class JsonWriterPipeline(object):

    def __init__(self):
        self.file = open('items.jl', 'wb')

    def process_item(self, item, spider):
        line = json.dumps(dict(item)) + "\n"
        self.file.write(line)
        return item

This way, when you yield an item from your spider class, it will print it out to items.jl.

Hope that helps.