I am trying to access a site with a bot prevention.
with the following script using requests I can access the site.
request = requests.get(url,headers={**HEADERS,'Cookie': cookies})
and I am getting the desired HTML. but when I use aiohttp
async def get_data(session: aiohttp.ClientSession,url,cookies):
async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
text = await response.text()
print(text)
I am getting as a response the bot prevention page.
This is the headers I use for all the requests.
HEADERS = {
'User-Agent': 'PostmanRuntime/7.29.0',
'Host': 'www.dnb.com',
'Connection': 'keep-alive',
'Accept': '/',
'Accept-Encoding': 'gzip, deflate, br'
}
I have compared the requests headers both of requests.get and aiohttp and they are identical.
is there any reason the results are different? if so why?
EDIT: I've checked the httpx module, the problem occurs there aswell both with httpx.Client()
and httpx.AsyncClient()
.
response = httpx.request('GET',url,headers={**HEADERS,'Cookie':cookies})
doesn't work as well. (not asyncornic)
I tried capturing packets with wireshark to compare requests and aiohttp.
Server:
with requests:
requests packet:
with aiohttp:
aiohttp packet:
If the site seems to accept packets from requests, then you could try making the aiohttp packet identical by setting the headers:
If you haven't already, I suggest capturing the request with wireshark to make sure aiohttp isn't messing with your headers.
You can also try other user agent strings too, or try the headers in different orders. The order is not supposed to matter, but some sites check it anyway for bot protection (for example in this question).