aiohttp and requests give different responses for the same url and parameters

48 Views Asked by At

I am trying to download hundreds of files from NexusMods, most are hundreds of mebibytes (1048576 bytes) in size, many are gibibytes (1073741824 bytes) in size.

I am using aiohttp + aiofiles to download them, my code is working but the whole process is complicated by my network condition, long story short I was born in China and I am still behind the Great Firewall of China and I use VPNs which are constantly throttled by the GFW.

It is extremely easy for the downloads to hang and freeze the progress, the connections will easily become stale and download speed drops to zero, the program will halt the execution to wait for the data that will never arrive without throwing exceptions, it just won't timeout.

Using an external downloader however prevents these problems from occurring, but these downloaders only have GUI and are hard to automate and hard to integrate with my own PyQt6 GUI application.

So I tried to change the hosts file and disconnect the VPN, this makes ping faster and requests library downloads successfully but aiohttp can't download the file because somehow it receives a different response for the exactly same parameters...

Steps to reproduce the error:

Assuming you are running Windows 10,

open C:\Windows\System32\drivers\etc\hosts file, you must run with administrative privileges

add the following line, then save

45.150.242.245 files.nexus-cdn.com

run the following commands in cmd.exe:

ipconfig /release
ipconfig /flushdns
ipconfig /renew

Now paste these lines of code into your Python interpreter, you must have the relevant libraries installed of course:

import asyncio
import aiohttp
import json
import requests
from pathlib import Path



URL = "https://files.nexus-cdn.com/120/16892/DC%20Delight%20for%20Type3%20%20V1dot1-16892-V1-1.rar?md5=V9HymdUhB2Zjh3GoDsm3Qg&expires=1709378564&user_id=114232553&rip=45.150.242.245"
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0"
}
print(requests.head(URL, headers=HEADERS).headers)


async def test():
    async with aiohttp.ClientSession(
        headers=HEADERS, connector=aiohttp.TCPConnector(ssl=False)
    ) as session:
        async with session.head(url=URL) as resp:
            return resp.headers


print(asyncio.run(test()))

I don't know what you will see, but for me the output is always this:

{'Server': 'nginx/1.24.0', 'Date': 'Sat, 02 Mar 2024 08:19:09 GMT', 'Content-Type': 'application/x-rar-compressed', 'Content-Length': '108004175', 'Last-Modified': 'Wed, 07 Oct 2015 12:58:46 GMT', 'Connection': 'keep-alive', 'ETag': '"56151706-670034f"', 'Expires': 'Thu, 31 Dec 2037 23:55:55 GMT', 'Cache-Control': 'max-age=315360000', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Accept-Ranges': 'bytes'}
<CIMultiDictProxy('Content-Type': 'text/plain; charset=utf-8', 'Strict-Transport-Security': 'max-age=63072000', 'Vary': 'Origin', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-Xss-Protection': '1; mode=block', 'Date': 'Sat, 02 Mar 2024 08:19:10 GMT', 'Content-Length': '19')>

Somehow aiohttp can't download the file.

The download link will expire and when it expires you will get 403 responses, the following code is used to generate the download link programmatically:

FIELDS = (
    "_app_session",
    "fwroute",
    "jwt_fingerprint",
    "member_id",
    "pass_hash",
    "sid_develop",
)


def load_cookies_list(file: str) -> list:
    lines = Path(file).read_text().splitlines()
    return [dict(zip(FIELDS, lines[i : i + 6])) for i in range(0, len(lines), 6)]


COOKIES = load_cookies_list("D:/cookies_list.txt")

DOWNLOAD_LINK_GENERATOR = (
    "https://www.nexusmods.com/Core/Libs/Common/Managers/Downloads?GenerateDownloadUrl"
)


def generate_download_link(file_id: int, game_id: int) -> str:
    resp = requests.post(
        url=DOWNLOAD_LINK_GENERATOR,
        data={"fid": file_id, "game_id": game_id},
        cookies=COOKIES[0],
    )
    return json.loads(resp.content)["url"]

You need a NexusMods account. Go to www.nexusmods.com, login to your NexusMods account, press F12, and find the cookies. This depends on your browser, if you are using Firefox, click storage tab and find the cookies there, if you are using Chrome click application tab.

You will need to copy the values of all the necessary cookies listed in the code, double click, ctrl + c then ctrl + v into a text file line by line in the listed order, save the file and change the path in the code.

Now you can copy paste the code into the console and run.

Download link is generated by generate_download_link(1000010043, 120). If the download link has expired, you need to do the above mentioned procedure to get a new download link.

Now I have verified that aiohttp only gets a different response if I change the hosts file, if I undo the change and ipconfig again, print(asyncio.run(test())) will give the correct output, but the latency is way higher:

<CIMultiDictProxy('Server': 'nginx/1.24.0', 'Date': 'Sat, 02 Mar 2024 08:59:26 GMT', 'Content-Type': 'application/x-rar-compressed', 'Content-Length': '108004175', 'Last-Modified': 'Wed, 07 Oct 2015 12:58:46 GMT', 'Connection': 'keep-alive', 'Etag': '"56151706-670034f"', 'Expires': 'Thu, 31 Dec 2037 23:55:55 GMT', 'Cache-Control': 'max-age=315360000', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Accept-Ranges': 'bytes')>

Why aiohttp gives a different response when I change the hosts file and how do I fix this?


Reply to the first comment, of course I know how to download the file asynchronously, and yes of course I know I am doing HEAD request, see my previous question for the downloader implementation.

I have changed the implementation to add error handling to automatically close the connections and restart download, but the resuming functionality doesn't work.

I don't know why, if I change hosts file, aiohttp will throw SSL error if I don't use a connector to disable SSL verification, and if I do get requests the content will be exactly this b'', I don't know why it is thus if I change hosts file, requests doesn't have the same problem when I change hosts.

async def test():
    async with aiohttp.ClientSession(
        headers=HEADERS, connector=aiohttp.TCPConnector(ssl=False)
    ) as session:
        async with session.head(url=URL) as resp:
            return await resp.read()


print(asyncio.run(test()))
In [27]: async def test():
    ...:     async with aiohttp.ClientSession(
    ...:         headers=HEADERS, connector=aiohttp.TCPConnector(ssl=False)
    ...:     ) as session:
    ...:         async with session.head(url=URL) as resp:
    ...:             return await resp.read()
    ...:
    ...:
    ...: print(asyncio.run(test()))
b''

After I change the hosts file back aiohttp can download the file normally:

async def test1():
    async with aiohttp.ClientSession(
        headers=HEADERS, connector=aiohttp.TCPConnector(ssl=False)
    ) as session:
        async with session.get(url=URL, headers={"Range": "bytes=0-127"}) as resp:
            return await resp.read()

print(asyncio.run(test1()))
In [28]: async def test1():
    ...:     async with aiohttp.ClientSession(
    ...:         headers=HEADERS, connector=aiohttp.TCPConnector(ssl=False)
    ...:     ) as session:
    ...:         async with session.get(url=URL, headers={"Range": "bytes=0-127"}) as resp:
    ...:             return await resp.read()
    ...:
    ...: print(asyncio.run(test1()))
b'Rar!\x1a\x07\x00\xcf\x90s\x00\x00\r\x00\x00\x00\x00\x00\x00\x00\xbc\x96t\xc0\x90W\x00t\x0c\x00\x00\xb0\x1b\x00\x00\x02\x1a\xd9z\x9d\xd9cGG\x1d32\x00 \x00\x00\x00DC Delight for Type3 v1.1\\DC delight readme v1.txt\x00\xb0\xc9\x8bX\x12!\x91\x0c\xcc\xd0\xfc\x95\xd5{\xe5T\xf0|\xe9\x9aR\xaa-&\x9f'

But if I change the hosts file again, aiohttp cannot download the file, but requests still can, and latency is way lower.


I haven't fixed this issue but I have finally properly added the resume functionality to my code, the program now automatically resumes downloading whenever there are problems and the downloaded files are downloaded intact.

And I have found that my download link generator that I pasted here contains bugs, so I fixed it here.

That download link generator contains bugs because it is an earlier version, I have opened many tabs in Visual Studio Code for all the different versions of the downloader, because I couldn't implement the auto resume functionality properly and so I have tried many different approaches, some versions don't work and I haven't fixed them. I copied my code from an earlier version.

0

There are 0 best solutions below