I am scraping an asp.net site by submitting form data. I am using threadpool to send 4 parallel requests. Now what is happening is that the first set of parallel requests get processed correctly and I get the desired response which I process as needed.
But the next request onward I get Runtime error(Description: An exception occurred while processing your request. Additionally, another exception occurred while executing the custom error page for the first exception. The request has been terminated.) as response. So unable to process more than 4 requests at a time. Any suggestions to improve the code snippet below are welcome
def EpicNo_Search(EpicNo):
print(EpicNo)
global headers,formData,url
choice= '21'
#This Intermediate requests are made to get the eventvalidation,viewstate Token
session = requests.session()
res = session.get(url,headers=headers)
soup = BeautifulSoup(res.text,'lxml')
formData['__EVENTVALIDATION'],formData['__VIEWSTATE'] = extract_form_hiddens(soup)
formData['ctl00$ContentPlaceHolder1$gr1'] = 'RadioButton2'
formData['__EVENTTARGET']='ctl00$ContentPlaceHolder1$RadioButton2'
res = session.post(url,urllib.parse.urlencode(formData), headers=headers)
if "Server Error" in res.text:
filename='zidlist'
with open('./{}.txt'.format(filename), mode='at', encoding='utf-8') as file:
file.write(EpicNo)
else:
#Final Request
soup = BeautifulSoup(res.text,'lxml')
formData['__EVENTVALIDATION'],formData['__VIEWSTATE']= extract_form_hiddens(soup)
formData['ctl00$ContentPlaceHolder1$gr1']='RadioButton2'
formData['ctl00$ContentPlaceHolder1$Drop4']=choice
formData['__EVENTTARGET']= ''
formData['ctl00$ContentPlaceHolder1$TextBox4']=EpicNo
formData['ctl00$ContentPlaceHolder1$Button3']= 'Search'
res= session.post(url,formData, headers=headers)
if 'No Record Found , Please Fill Form 6' in res.text:
write_csv('No Match','output.csv',epicno=EpicNo)
else:
write_csv(res.text.encode('utf-8'),'output.csv')
#We make 4 parallel requests to the website for faster result consolidation
pool = ThreadPool(4)
pool.map(EpicNo_Search, epicnolist)
My request header has Useragent info, cache-control(max-age=0) and connection(keep-alive)