Scraping HTTPS pages using Scrapy and Crawlera

889 Views Asked by Bociek At 04 January 2019 at 22:58

I would like to if it is possible to crawl https pages using scrapy + crawlera. So far I was using Python requests with the following settings:

proxy_host = 'proxy.crawlera.com'
proxy_port = '8010'
proxy_auth = 'MY_KEY'
proxies    = {
    "https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, 
proxy_port),
    "http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)
}
ca_cert    = 'crawlera-ca.crt'

res = requests.get(url='https://www.google.com/',
    proxies=proxies,
    verify=ca_cert
)

I want to move into async execution via Scrapy. I know there is scrapy-crawlera plugin, but I do not know how to configure it when I have the certificate. Also, one thing bothers me. Crawlera comes with different pricing plans. The basic one is C10 which allows for 10 concurrent requests. What does it mean? Do I need to set CONCURRENT_REQUESTS=10 in settings.py?

Original Q&A

Scraping HTTPS pages using Scrapy and Crawlera

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PROXY

Related Questions in SCRAPY

Related Questions in CRAWLERA

Trending Questions

Popular # Hahtags

Popular Questions