python web scraping asp.net site returns internal server error after initial success

114 Views Asked by At

I am scraping an asp.net site by submitting form data. I am using threadpool to send 4 parallel requests. Now what is happening is that the first set of parallel requests get processed correctly and I get the desired response which I process as needed.

But the next request onward I get Runtime error(Description: An exception occurred while processing your request. Additionally, another exception occurred while executing the custom error page for the first exception. The request has been terminated.) as response. So unable to process more than 4 requests at a time. Any suggestions to improve the code snippet below are welcome

def EpicNo_Search(EpicNo):
print(EpicNo)
global headers,formData,url 
choice= '21'
#This Intermediate requests are made to get the eventvalidation,viewstate Token
session = requests.session()
res = session.get(url,headers=headers)
soup = BeautifulSoup(res.text,'lxml')

formData['__EVENTVALIDATION'],formData['__VIEWSTATE'] = extract_form_hiddens(soup)
formData['ctl00$ContentPlaceHolder1$gr1'] = 'RadioButton2'
formData['__EVENTTARGET']='ctl00$ContentPlaceHolder1$RadioButton2'
res = session.post(url,urllib.parse.urlencode(formData), headers=headers)

if  "Server Error" in res.text:
    filename='zidlist'
    with open('./{}.txt'.format(filename), mode='at', encoding='utf-8') as file:
        file.write(EpicNo)
else:
    #Final Request 
    soup = BeautifulSoup(res.text,'lxml')
    formData['__EVENTVALIDATION'],formData['__VIEWSTATE']= extract_form_hiddens(soup)                                   
    formData['ctl00$ContentPlaceHolder1$gr1']='RadioButton2'
    formData['ctl00$ContentPlaceHolder1$Drop4']=choice
    formData['__EVENTTARGET']= ''
    formData['ctl00$ContentPlaceHolder1$TextBox4']=EpicNo
    formData['ctl00$ContentPlaceHolder1$Button3']= 'Search'
    res= session.post(url,formData, headers=headers)
    if 'No Record Found , Please Fill Form 6' in res.text:
        write_csv('No Match','output.csv',epicno=EpicNo)
    else:
        write_csv(res.text.encode('utf-8'),'output.csv')


    #We make 4 parallel requests to the website for faster result consolidation
pool = ThreadPool(4)
pool.map(EpicNo_Search, epicnolist)

My request header has Useragent info, cache-control(max-age=0) and connection(keep-alive)

0

There are 0 best solutions below