how to scrape anonymously using Scrapy Tor Privoxy & UserAgent? (Windows 10)

Question

how to scrape anonymously using Scrapy Tor Privoxy & UserAgent? (Windows 10)

1.3k Views Asked by J. Does At 28 June 2025 at 01:34

The answer of this question was quite difficult to find since informations are scattered, and the title of the questions are sometime misleading. The answer below regroup all informations needed in one place.

Original Q&A

There are 1 best solutions below

**J. Does** · Answer 1

Your spider should look like.

# based on https://doc.scrapy.org/en/latest/intro/tutorial.html

import scrapy
import requests

class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            print('\n\nurl:', url)
      ## use one of the yield below

            # middleware will process the request
            yield scrapy.Request(url=url, callback=self.parse) 

            # check if Tor has changed IP
            #yield scrapy.Request('http://icanhazip.com/', callback=self.is_tor_and_privoxy_used) 


    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        print('\n\nSpider: Start')
        print('Is proxy in response.meta?: ', response.meta)
        print ("user_agent is: ",response.request.headers['User-Agent'])
        print('\n\n Spider: End')
        self.log('Saved file  ---  %s' % filename)


    def is_tor_and_privoxy_used(self, response):
        print('\n\nSpider: Start')
        print("My IP is : " + str(response.body))
        print("Is proxy in response.meta?: ", response.meta)  # not header dispo
        print('\n\nSpider: End')
        self.log('Saved file %s' % filename)

You will also need to add stuff in middleware.py and settings.py . If you don't know how to do it this will help you

how to scrape anonymously using Scrapy Tor Privoxy & UserAgent? (Windows 10)

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in SCRAPY

Related Questions in TOR

Related Questions in PRIVOXY

Trending Questions

Popular # Hahtags

Popular Questions