I have a Python script which its objective is to open up a web page based on user input, and then scrape specific information from that web page. This script begins with the following import statements:
import socks
import socket
from urllib.request import urlopen
from time import sleep
from bs4 import BeautifulSoup
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket
The part where the error occurs involves processing the url of the desired web page.
url_name = "http://<website name>"
print("url name is : " + url_name)
print("About to open the web page")
sleep(5)
**webpage = urlopen(url_name)**
print("Web page opened successfully")
sleep(5)
html = webpage.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
print("HTML extracted")
sleep(5)
print("Printing soup object text")
sleep(5)
print(soup.get_text())
When the script reaches the highlighted statement (where the urlopen method is called), I received the following error messages:
1599147846 WARNING torsocks[20820]: [connect] Connection to a local address are denied since it might be a TCP DNS query to a local DNS server. Rejecting it for safety reasons. (in tsocks_connect() at connect.c:193)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/socks.py", line 832, in connect
super(socksocket, self).connect(proxy_addr)
PermissionError: [Errno 1] Operation not permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/urllib/request.py", line 1326, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/lib/python3.8/http/client.py", line 1240, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1286, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1235, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1006, in _send_output
self.send(msg)
File "/usr/lib/python3.8/http/client.py", line 946, in send
self.connect()
File "/usr/lib/python3.8/http/client.py", line 917, in connect
self.sock = self._create_connection(
File "/usr/lib/python3.8/socket.py", line 808, in create_connection
raise err
File "/usr/lib/python3.8/socket.py", line 796, in create_connection
sock.connect(sa)
File "/usr/lib/python3/dist-packages/socks.py", line 100, in wrapper
return function(*args, **kwargs)
File "/usr/lib/python3/dist-packages/socks.py", line 844, in connect
raise ProxyConnectionError(msg, error)
socks.ProxyConnectionError: Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dark_web_scrape_main.py", line 68, in <module>
webpage = urlopen(url_name)
File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 1355, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.8/urllib/request.py", line 1329, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted>
Also, I had torsocks running in the same VM as this script, which is Ubuntu v20.04.
Someone mentioned running "sudo" with this script. However in doing so, this occurred:
$ sudo python3 dark_web_scrape_main.py
Traceback (most recent call last):
File "dark_web_scrape_main.py", line 5, in <module>
from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'
So by originally running this script with "sudo", I cannot even get to the data entry prompt. Yet running this script as a generic user, it recognizes the socks module and thus got me further.
Prior to running this script, I ensured that I had installed socks, socket, and beautifulsoup4. I even tried installing bs4 (abbreviated notation for 'beautifulsoup4'). This is what displayed:
$ pip3 install bs4
Collecting bs4
Downloading bs4-0.0.1.tar.gz (1.1 kB)
Requirement already satisfied: beautifulsoup4 in ./.local/lib/python3.8/site-packages (from bs4) (4.9.1)
Requirement already satisfied: soupsieve>1.2 in ./.local/lib/python3.8/site-packages (from beautifulsoup4->bs4) (2.0.1)
Building wheels for collected packages: bs4
Building wheel for bs4 (setup.py) ... done
Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=912f922932a07d98aa26eca2ba3dde8e761813eea766dfe42617135f038943e4
Stored in directory: /home/jbottiger/.cache/pip/wheels/75/78/21/68b124549c9bdc94f822c02fb9aa3578a669843f9767776bca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1
I reran the script using 'sudo', but I received the same error message:
$ sudo python3 dark_web_scrape_main.py
Traceback (most recent call last):
File "dark_web_scrape_main.py", line 5, in <module>
from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'
I discovered that I didn't install the bs4 module properly. So I ensured that module was installed properly:
sudo apt-get install python3-bs4
Rerunning "sudo python3 dark_web_scrape_main.py", I finally get through the input method section, but when attempting to execute the urlopen method this time, the following error message displayed:
About to open the web page
Traceback (most recent call last):
File "/usr/lib/python3.8/urllib/request.py", line 1326, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/lib/python3.8/http/client.py", line 1240, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1286, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1235, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1006, in _send_output
self.send(msg)
File "/usr/lib/python3.8/http/client.py", line 946, in send
self.connect()
File "/usr/lib/python3.8/http/client.py", line 917, in connect
self.sock = self._create_connection(
File "/usr/lib/python3.8/socket.py", line 787, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dark_web_scrape_main.py", line 68, in <module>
webpage = urlopen(url_name)
File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 1355, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.8/urllib/request.py", line 1329, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>
I was figuring that I could not open onion sites on the Firefox browser within my Ubuntu v20.04 VM. So for kicks and giggles, I opened Firefox, and entered in the browser window: "http://xmh57jrzrnw6insl.onion". It returned "We cannot connect to the server at 'http://xmh57jrzrnw6insl.onion'".
I researched this specific issue on https://protonmail.com/support/knowledge-base/firefox-onion-sites/, and followed these steps:
- In Firefox, enter "about:config" in the browser URL field (aka search bar).
- Selected the button to "accept the risk and continue".
- Entered "network.dns.blockDotOnion" in the search bar.
- Current setting for this attribute was "True"; toggled is to "False".
Retried accessing that onion site. Still doesn't work.
I even updated the /etc/tor/torrc file by removing the comment marks from the following statements:
ControlPort 9051
CookieAuthorization 1
I also modified the "CookieAuthorization" attribute value to '0'. Still cannot access onion sites.
Finally, I realized in the "about:preferences" section in Firefox, while I had setup Manual Proxy Configuration with localhost:9050, I forgot to deselect "Enable DNS over HTTPS" and select "Proxy DNS when using SOCKS v5". Now I can access the onion sites in my Firefox browser. However, I still get errors upon reaching the urlopen method call in my script. Please advise.
My professor advised me to preface the "python3 <script_name>.py call with "torsocks". However, I cannot seem to use both "sudo" and "torsocks" as simultaneous prefaces.