Errors Received when Invoking Urlopen Method

721 Views Asked by At

I have a Python script which its objective is to open up a web page based on user input, and then scrape specific information from that web page. This script begins with the following import statements:

import socks
import socket
from urllib.request import urlopen
from time import sleep
from bs4 import BeautifulSoup

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket

The part where the error occurs involves processing the url of the desired web page.

url_name = "http://<website name>"
print("url name is : " + url_name)
print("About to open the web page")
sleep(5)
**webpage = urlopen(url_name)**
print("Web page opened successfully")
sleep(5)
html = webpage.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
print("HTML extracted")
sleep(5)
print("Printing soup object text")
sleep(5)
print(soup.get_text())

When the script reaches the highlighted statement (where the urlopen method is called), I received the following error messages:

1599147846 WARNING torsocks[20820]: [connect] Connection to a local address are denied since it might be a TCP DNS query to a local DNS server. Rejecting it for safety reasons. (in tsocks_connect() at connect.c:193)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/socks.py", line 832, in connect
    super(socksocket, self).connect(proxy_addr)
PermissionError: [Errno 1] Operation not permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py", line 1326, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.8/http/client.py", line 1240, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1286, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1235, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1006, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 946, in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py", line 917, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
  File "/usr/lib/python3/dist-packages/socks.py", line 100, in wrapper
    return function(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/socks.py", line 844, in connect
    raise ProxyConnectionError(msg, error)
socks.ProxyConnectionError: Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 68, in <module>
    webpage = urlopen(url_name)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 1355, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.8/urllib/request.py", line 1329, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted>


Also, I had torsocks running in the same VM as this script, which is Ubuntu v20.04.

Someone mentioned running "sudo" with this script. However in doing so, this occurred:

$ sudo python3 dark_web_scrape_main.py 
Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 5, in <module>
    from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

So by originally running this script with "sudo", I cannot even get to the data entry prompt. Yet running this script as a generic user, it recognizes the socks module and thus got me further.

Prior to running this script, I ensured that I had installed socks, socket, and beautifulsoup4. I even tried installing bs4 (abbreviated notation for 'beautifulsoup4'). This is what displayed:

$ pip3 install bs4
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Requirement already satisfied: beautifulsoup4 in ./.local/lib/python3.8/site-packages (from bs4) (4.9.1)
Requirement already satisfied: soupsieve>1.2 in ./.local/lib/python3.8/site-packages (from beautifulsoup4->bs4) (2.0.1)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... done
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=912f922932a07d98aa26eca2ba3dde8e761813eea766dfe42617135f038943e4
  Stored in directory: /home/jbottiger/.cache/pip/wheels/75/78/21/68b124549c9bdc94f822c02fb9aa3578a669843f9767776bca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1

I reran the script using 'sudo', but I received the same error message:

$ sudo python3 dark_web_scrape_main.py 
Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 5, in <module>
    from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

I discovered that I didn't install the bs4 module properly. So I ensured that module was installed properly:

sudo apt-get install python3-bs4

Rerunning "sudo python3 dark_web_scrape_main.py", I finally get through the input method section, but when attempting to execute the urlopen method this time, the following error message displayed:

About to open the web page
Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py", line 1326, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.8/http/client.py", line 1240, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1286, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1235, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1006, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 946, in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py", line 917, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py", line 787, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 68, in <module>
    webpage = urlopen(url_name)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 1355, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.8/urllib/request.py", line 1329, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>

I was figuring that I could not open onion sites on the Firefox browser within my Ubuntu v20.04 VM. So for kicks and giggles, I opened Firefox, and entered in the browser window: "http://xmh57jrzrnw6insl.onion". It returned "We cannot connect to the server at 'http://xmh57jrzrnw6insl.onion'".

I researched this specific issue on https://protonmail.com/support/knowledge-base/firefox-onion-sites/, and followed these steps:

  1. In Firefox, enter "about:config" in the browser URL field (aka search bar).
  2. Selected the button to "accept the risk and continue".
  3. Entered "network.dns.blockDotOnion" in the search bar.
  4. Current setting for this attribute was "True"; toggled is to "False".

Retried accessing that onion site. Still doesn't work.

I even updated the /etc/tor/torrc file by removing the comment marks from the following statements:

ControlPort 9051
CookieAuthorization 1

I also modified the "CookieAuthorization" attribute value to '0'. Still cannot access onion sites.

Finally, I realized in the "about:preferences" section in Firefox, while I had setup Manual Proxy Configuration with localhost:9050, I forgot to deselect "Enable DNS over HTTPS" and select "Proxy DNS when using SOCKS v5". Now I can access the onion sites in my Firefox browser. However, I still get errors upon reaching the urlopen method call in my script. Please advise.

My professor advised me to preface the "python3 <script_name>.py call with "torsocks". However, I cannot seem to use both "sudo" and "torsocks" as simultaneous prefaces.

0

There are 0 best solutions below