I'm trying to scrape a page for data, but their login process has me stumped. As it's a ASP Net site, my searches has me including __VIEWSTATE and __VIEWSTATEGENERATOR, but I cannot find __EVENTTARGET or __EVENTVALIDATION, not sure if they can be missing sometimes.
The Website login page has this form (Personal data get's prefilled, so ***** those):
<form method="get" action="./login.aspx" id="validateSubmitForm" autocomplete="off" novalidate="">
<div class="aspNetHidden">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="*****long viewstate here****" />
</div>
<div class="aspNetHidden">
<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="******" />
</div>
<div class="row">
<div class="form-group col-md-12 mb-4">
<!--
<input type="email" class="form-control input-lg" id="email" aria-describedby="emailHelp"
placeholder="email"> -->
<input name="TextBox1N" type="text" value="*******" id="TextBox1N" title="Username" class="form-control input-lg" placeholder="Username" />
</div>
<div class="form-group col-md-12 ">
<!--
<input type="password" class="form-control input-lg" id="password" placeholder="Password">
-->
<input name="TextBox2N" type="password" id="TextBox2N" class="form-control input-lg" placeholder="Password" value="******" />
</div>
<div class="form-group col-md-12 ">
</div>
<div class="col-md-12">
<div class="d-flex justify-content-between mb-3">
<div class="custom-control custom-checkbox mr-3 mb-3">
<!--
<input type="checkbox" class="custom-control-input" id="customCheck2">
<label class="custom-control-label" for="customCheck2">Remember me</label>
-->
<input id="CheckBox1N" type="checkbox" name="CheckBox1N" checked="checked" />
<span id="remember_meN" for="CheckBox1N">Remember me</span>
</div>
<a class="text-color" href="remember.aspx"> Remember </a>
</div>
<!--
<button type="submit" class="btn btn-primary btn-pill mb-4" style="width:100% !important">Sign In</button>
-->
<input type="submit" name="Button1N" value="Sign in" id="Button1N" class="btn btn-primary btn-pill mb-4" style="width:100% !important" />
<p>
Don't have an account yet ?
<a class="text-blue" href="registrati.aspx"> Sign</a>
<!--
<input type="submit" name="Button2N" value="Sign up" id="Button2N" class="text-blue" />
-->
</p>
</div>
</div>
</form>
What I've cobbled together so far is (url and login info masked):
from bs4 import BeautifulSoup
import requests
#Session Setup
s = requests.Session()
s.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"})
uName='******'
pwd ='******'
#Load page
url='http://***/login.aspx'
r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
#Set params
paramsPost = {"TextBox1N": uName,
"TextBox2N": pwd,
"CheckBox1N": "on",
"Button1N": "Sign in"
}
#Add __VIEWSTATE params
paramsPost['__VIEWSTATE'] = soup.find('input', id='__VIEWSTATE')['value']
paramsPost['__VIEWSTATEGENERATOR'] = soup.find('input', id='__VIEWSTATEGENERATOR')['value']
#Login to a GET form
req = requests.Request('GET', url, data=paramsPost)
prep = req.prepare()
pUrl = prep.url+'?'+prep.body #this was mostly done so I could print the full url and verify against a browser generated one
r = s.get(url)
For posterity I have also tried the following:
r = s.post(url, data=paramsPost)
print(r.url)
Both ways just send me to the ./error.aspx page.
Logging in with a browser and inspecting the network shows a GET request was made, __VIEWSTATE, __VIEWSTATEGENERATOR, TextBox1N, TextBox2N, CheckBox1N and Button1N was added to the Request URL. Status 302 returned and then redirected to ./dashboardAssets.aspx
Interestingly, __VIEWSTATE my code returns is shorter than the __VIEWSTATE my browser returns. Is this related?
Everything I Google or Search on SO points to __EVENT params, but I can't locate them, so not sure this site needs them.
Any other ideas I can try?