keyword argument clashes with variable

185 Views Asked by At

Although this is most likely a newbie question I struggled to find any information online to help me with my problem

My code is meant to scrap onion sites, and despite being able to connect to TOR and the web scraper working fine as a stand-alone, when I tried combining both code blocks I kept getting numerous errors regarding the keyword argument in my code, even attempting to delete it presents me with bugs, I am a bit lost on what I'm supposed to do

import socket
import socks
import requests
from pywebcopy import save_webpage

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket

def get_tor_session():
    session = requests.session()
    # Tor uses the 9050 port as the default socks port
    session.proxies = {'http':  'socks5h://127.0.0.1:9050',
                       'https': 'socks5h://127.0.0.1:9050'}
    return session


session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
  
kwargs = {'project_name': 'site folder'}

save_webpage(
    
        # url of the website
        
session.get(url="http://elfqv3zjfegus3bgg5d7pv62eqght4h6sl6yjjhe7kjpi2s56bzgk2yd.onion"),
        
    # folder where the copy will be saved            

        project_folder=r"C:\Users\admin\Desktop\WebScraping",
        **kwargs
)

In this case, I'm presented with the following error:

TypeError: Cannot mix str and non-str arguments

attempting to replace

project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs

with

kwargs, 
project_folder=r"C:\Users\admin\Desktop\WebScraping"

presents me with this error:

TypeError: save_webpage() got multiple values for argument

traceback for the first error:

  File "C:\Users\admin\Desktop\WebScraping\tor.py", line 43, in <module>
    **kwargs

  File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\api.py", line 58, in save_webpage
    config.setup_config(url, project_folder, project_name, **kwargs)

  File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\configs.py", line 189, in setup_config
    SESSION.load_rules_from_url(urljoin(project_url, '/robots.txt'))

  File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 487, in urljoin
    base, url, _coerce_result = _coerce_args(base, url)

  File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 120, in _coerce_args
    raise TypeError("Cannot mix str and non-str arguments")

I'd really appreciate an explanation on what causes such a bug and how to avoid it in the future

2

There are 2 best solutions below

2
AanTuning On

SOLVED

adding the following code resolved the issue:

def getaddrinfo(*args):
    return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]

socket.getaddrinfo = getaddrinfo
4
Paul M. On

Not sure why this hasn't been answered yet. As mentioned in my comment, simply change this:

save_webpage(
    # url of the website
    session.get(url=...),

    # folder where the copy will be saved            
    project_folder=r"C:\Users\admin\Desktop\WebScraping",
    **kwargs
)

To:

save_webpage(
    # url of the website
    url=...,

    # folder where the copy will be saved            
    project_folder=r"C:\Users\admin\Desktop\WebScraping",
    **kwargs
)

save_webpage makes the request internally.