TorRequests and Python - InvalidSchema: Missing dependencies for SOCKS support

2.9k Views Asked by At

I want to make an anonymous web request using python 3 with help of Tor, and I'm following this tutorial: https://computerscienceandfangs.blogspot.com/2018/04/setting-up-tor-for-windows-10-python-3.html.

So far I'm just testing the first part of the tutorial code (below):

import requests

def get_tor_session():
    session = requests.session()
    # Tor uses the 9050 port as the default socks port
    session.proxies = {'http':  'socks5://127.0.0.1:9050',
                       'https': 'socks5://127.0.0.1:9050'}
    return session

# Make a request through the Tor connection
# IP visible through Tor
session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
# Above should print an IP different than your public IP

# Following prints your normal public IP
print(requests.get("http://httpbin.org/ip").text)

So when I execute the code: print(session.get("http://httpbin.org/ip").text), it should show me a different IP address to mine. However instead I get the error:

 File "C:\Program Files\Anaconda3\lib\site-packages\requests\adapters.py", line 43, in SOCKSProxyManager
    try:

InvalidSchema: Missing dependencies for SOCKS support.

I've installed the packages below as per the tutorial:

1)pip install requests -- upgrade

2)pip install requests[socks]

3)pip install stem

I'm using Windows 7 (64-bit). Spyder for Python IDE. Python version 3.5.

Second question, which is more general. I'm looking to make requests on a bigger scale as part of a project for a web-scraper. Is the approach above, using the tutorial I referenced, still a good approach (i.e. coding things manually using Python), to ensure you don't get banned/black-listed? Or Are there services out there more advanced that can do Anonymous IP request, IP rotating and request throttling all for you, without you having to code your own software and configure manually, and with unlimited number of requests?

Many thanks in advance.

2

There are 2 best solutions below

0
Hazzaldo On BEST ANSWER

To resolve the error: InvalidSchema: Missing dependencies for SOCKS support I had restart Tor service in Windows OS, by running the following in command line:

tor --service remove

then

tor --service install -options ControlPort 9051

10
Lord Elrond On

Are you running a tor service from the cli?

Your proxy should look like this:

session.proxies = {'http':  'socks5h://127.0.0.1:9050',
                   'https': 'socks5h://127.0.0.1:9050'}

Also, requests is not designed for making mass amounts of requests in the way you describe. I would recommend using the following setup, which uses aiohttp, aiohttp_socks, and asyncio.

import asyncio, aiohttp
from aiohttp_socks import SocksConnector

async def get_one(url, callback):
    connector = SocksConnector.from_url('socks5://localhost:9050', rdns=True)
    # rdns=True is important!
    # 1) Can't connect to hidden services without it
    # 2) You will make DNS lookup requests using your real IP, and not your Tor IP!
    async with aiohttp.ClientSession(connector=connector) as session:
        print(f'Starting {url}')
        async with session.get(url) as res:
            return await callback(res)

def get_all(urls, callback):
    future = []
    for url in urls:
        task = asyncio.ensure_future(get_one(url, callback))
        future.append(task)

    return future

def test_callback(res):
    print(res.status)

if __name__ == '__main__':
    urls = [
        'https://python.org', 
        'https://google.com',
        #...
    ]

    loop = asyncio.get_event_loop()
    future = get_all(urls, test_callback)
    loop.run_until_complete(asyncio.wait(future))