What exactly should we pass in response while post requesting in scrapy?

102 Views Asked by At
from scrapy import FormRequest

url = "https://stackoverflow.com/users/login"
fetch(url)
req = FormRequest.from_response(
    response,
    formid='login-form',
    formdata={'email': '[email protected]',
              'password': 'testpw'},
    clickdata={'id': 'submit-button'},
)
fetch(req)

Using the above code in scrapy shell, I could log in stackoverflow. But, I wanted to perform this activity not as a command line arguments. So, I was trying to login using above commands in subprocess.

import subprocess
import scrapy
from scrapy import FormRequest
from subprocess import run
from bs4 import BeautifulSoup

class QuoteSpider(scrapy.Spider):
    name = 'stackover'
    start_urls = ['https://stackoverflow.com/users/login']

    run(["scrapy","fetch", start_urls[0]], capture_output=True, text=True)

    def parse(self, response):
        req = FormRequest.from_response(
            response,
            formid='login-form',
            formdata={'email': '[email protected]',
                    'password': 'testpw'},
            clickdata={'id': 'submit-button'},
        )
        run(["scrapy","fetch", req], shell=True)

But it is giving me errors like this:

TypeError: argument of type 'FormRequest' is not iterable

I also tried to save the response in html file and read that file as response and got the same error message as above.

with open("output.html","w") as f:
    response = call(["scrapy","fetch", url], stdout=f, shell=True)

with open("output.html", encoding="utf-8") as f:
    data = f.read()
    response = BeautifulSoup(data, 'lxml')

I have also tried to get text response and again got above mentioned error message.

r = run(["scrapy","fetch", start_urls[0]], capture_output=True)
response = r.stdout.decode()

I also tried to formrequest before calling parse function like:

class QuoteSpider(scrapy.Spider):
    name = 'stackover'
    start_urls = ['https://stackoverflow.com/users/login']

    r = run(["scrapy","fetch", start_urls[0]], capture_output=True)
    response = r.stdout.decode()

    req = FormRequest.from_response(
        response,
        formid='login-form',
        formdata={'email': '[email protected]',
                'password': 'testpw'},
        clickdata={'id': 'submit-button'},
    )
    run(["scrapy","fetch", req], shell=True)

    def parse(self, response):
        print(response)

And, I got new error.

AttributeError: 'str' object has no attribute 'encoding'

So, how could I run scrapy shell commands using subprocess to login into stackoverflow. And what exactly is the response in Formrequest in scrapy is taking as input?

I am learning scrapy and various methods to login stackoverflow to practice web scraping.

1

There are 1 best solutions below

0
On
from scrapy import FormRequest
from scrapy import Spider

class StackSpider(Spider):
    name = 'stack_spider'
    # List of urls for initial requests. Can be one or many.
    # Default method parse() is called for start resoponses.
    start_urls = ["https://stackoverflow.com/users/login"] 

    # Parsing users/login page. Getting form and moving on.
    def parse(self, response):
        yield FormRequest.from_response(
            response,
            formid='login-form',
            formdata={'email': '[email protected]',
                    'password': 'testpw'},
            clickdata={'id': 'submit-button'},
            callback=self.parse_login
        )

    # Parsing login result
    def parse_login(self, response):
        print('Checking logging in here.')

You can run this code in terminal using scrapy crawl stack_spider