multipart-form-data, POST method, with multiples forms in the page

839 Views Asked by At

Problem

I'm trying to make a scraping in a page using request's python lib, however I'm getting errors (Like Bad request or Method not allowed).

  • The page has two forms: one with get, and another one, with post (which I wish). I did pass values to text fields using 'data requests'.

  • I don't wanna pass an image for the form, just a text field.

  • I have six buttons in the form, for each button I have a different value.


HTML code

<form enctype="multipart/form-data" action="/page1" method="GET"> ... </form>
...
<form enctype="multipart/form-data" action="/page2" method="POST"> 
  <input type="file" name="smiles_file">
  <input type="text" name="smiles_str">
  ...
  <button name="pred_type" type="submit" value="adme"> BT1 </button>
  <button name="pred_type" type="submit" value="toxicity"> BT2 </button>
</form>

Python3 code

#imports
import requests
from bs4 import BeautifulSoup as bs

#commmon vars
url = 'www.exampleurl.com/site'
hd  = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36"
}
dt  = {
'smiles_str': 'CC(=O)OC1=CC=CC=C1C(=O)O',
'pred_type': 'adme'
}

#scraping
with requests.Session() as rs:
    result = rs.get(url, data=dt, headers=hd)
    print ("Code: %s\nHTML\n%s" % (result.status_code, result.text))

EDIT

Using get: status_code: 405 (Method ... ) Using post: status_code: 400 (Bad request)

2

There are 2 best solutions below

3
On

I don't see a reference to /page1 nor /page2 in your example, but the rs.get should probably be using the named parameter params instead of data and should correspond to the first form URL, while for the second form URL you'd need to use the rs.post method, where using data is okay.

1
On

I think I found the answer. It seems that selenium does not work well on pages that work with js background. I'm using selenium, and I'm not having problems with it.