How can I download a file from a URL using Python when requests is redirecting to an error page

36 Views Asked by At

I'm attempting to download the following file using Python: Dallas DCAD 2024 Appraisals

The download works in my browser, but when I try to do it in Python I'm redirected to an Error page. The response content is the HTML of Errors.aspx instead of the zip binary data.

Here is what I've tried:

import requests

url = 'https://www.dallascad.org/ViewPDFs.aspx?type=3&id=\\DCAD.ORG\WEB\WEBDATA\WEBFORMS\DATA%20PRODUCTS\DCAD2024_CURRENT.ZIP'
headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
    }
r = requests.get(url, allow_redirects=True, headers=headers, timeout=None)
print(f"URL: {r.url}")
print(f"Status Code: {r.status_code}")
for i,h in enumerate(r.history):
    print(f"History[{i}] URL: {h.url}")
    print(f"History[{i}] Status: {h.status_code}")
    print(f"History[{i}] Headers: {h.headers}")

Output:

URL: https://www.dallascad.org/Errors/ErrorPage.aspx?aspxerrorpath=/ViewPDFs.aspx
Status Code: 200
History[0] URL: https://www.dallascad.org/ViewPDFs.aspx?type=3&id=%5CDCAD.ORG%5CWEB%5CWEBDATA%5CWEBFORMS%5CDATA%20PRODUCTS%5CDCAD2024_CURRENT.ZIP
History[0] Status: 302
History[0] Headers: {'Cache-Control': 'private', 'Content-Type': 'text/html; charset=utf-8', 'Location': '/Errors/ErrorPage.aspx?aspxerrorpath=/ViewPDFs.aspx', 'Server': 'Microsoft-IIS/8.5', 'Content-Disposition': 'attachment;filename=DCAD2024_CURRENT.ZIP', 'X-AspNet-Version': '4.0.30319', 'X-Powered-By': 'ASP.NET', 'Date': 'Tue, 26 Mar 2024 14:35:36 GMT', 'Content-Length': '168'}
1

There are 1 best solutions below

0
SIGHUP On BEST ANSWER

The id parameter contains significant backslashes. Therefore you need to change the URL into a raw string.

The site does not require any headers.

Therefore:

import requests

url = r"https://www.dallascad.org/ViewPDFs.aspx?type=3&id=\\DCAD.ORG\WEB\WEBDATA\WEBFORMS\DATA%20PRODUCTS\DCAD2024_CURRENT.ZIP"

with requests.get(url, stream=True) as response:
    response.raise_for_status()
    with open("DCAD2024_CURRENT.ZIP", "wb") as output:
        for chunk in response.iter_content(4096):
            output.write(chunk)