from scrapy import FormRequest
url = "https://stackoverflow.com/users/login"
fetch(url)
req = FormRequest.from_response(
response,
formid='login-form',
formdata={'email': '[email protected]',
'password': 'testpw'},
clickdata={'id': 'submit-button'},
)
fetch(req)
Using the above code in scrapy shell, I could log in stackoverflow. But, I wanted to perform this activity not as a command line arguments. So, I was trying to login using above commands in subprocess.
import subprocess
import scrapy
from scrapy import FormRequest
from subprocess import run
from bs4 import BeautifulSoup
class QuoteSpider(scrapy.Spider):
name = 'stackover'
start_urls = ['https://stackoverflow.com/users/login']
run(["scrapy","fetch", start_urls[0]], capture_output=True, text=True)
def parse(self, response):
req = FormRequest.from_response(
response,
formid='login-form',
formdata={'email': '[email protected]',
'password': 'testpw'},
clickdata={'id': 'submit-button'},
)
run(["scrapy","fetch", req], shell=True)
But it is giving me errors like this:
TypeError: argument of type 'FormRequest' is not iterable
I also tried to save the response in html file and read that file as response and got the same error message as above.
with open("output.html","w") as f:
response = call(["scrapy","fetch", url], stdout=f, shell=True)
with open("output.html", encoding="utf-8") as f:
data = f.read()
response = BeautifulSoup(data, 'lxml')
I have also tried to get text response and again got above mentioned error message.
r = run(["scrapy","fetch", start_urls[0]], capture_output=True)
response = r.stdout.decode()
I also tried to formrequest before calling parse function like:
class QuoteSpider(scrapy.Spider):
name = 'stackover'
start_urls = ['https://stackoverflow.com/users/login']
r = run(["scrapy","fetch", start_urls[0]], capture_output=True)
response = r.stdout.decode()
req = FormRequest.from_response(
response,
formid='login-form',
formdata={'email': '[email protected]',
'password': 'testpw'},
clickdata={'id': 'submit-button'},
)
run(["scrapy","fetch", req], shell=True)
def parse(self, response):
print(response)
And, I got new error.
AttributeError: 'str' object has no attribute 'encoding'
So, how could I run scrapy shell commands using subprocess to login into stackoverflow. And what exactly is the response in Formrequest in scrapy is taking as input?
I am learning scrapy and various methods to login stackoverflow to practice web scraping.
You can run this code in terminal using
scrapy crawl stack_spider