How to download thousands of excel files from website?

79 Views Asked by Jpvdb At 14 March 2024 at 09:48

this is the website in question.

https://clientportal.jse.co.za/downloadable-files?RequestNode=/Safex/agriculture.stats

I want to download every year's excel files from 2009.

I don't want to manually click all of them, any help? Not a lot of experience with coding so keep it simple please. I know python is an option but I don't have any experience with python, VBA could work but I am not having any luck there.

For now all I want to do is download them all, but if there is also a way to consolidate the files that would be very helpful too. In full honesty I can't code at all, but it seems that is the only way to do this task and might take to long to learn a whole coding language and need these files downloaded as soon as possible.

If this is too complicated, please advise if I should bite the bullet and manually do it.

Thanks.

Original Q&A

There are 1 best solutions below

Federicofkt On 14 March 2024 at 10:03 BEST ANSWER

Assuming that you already have an environment in which you can run the code (if not, please install anaconda and use jupyter notebook), this code does the trick:

First time you have to install the libraries (run this only once):

%pip install bs4
%pip install requests

Then you can run the code:

import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin 


def download_excel_files(url, download_path):

    response = requests.get(url)
    

    if response.status_code == 200:

        soup = BeautifulSoup(response.content, 'html.parser')
        
        links = soup.find_all('a', href=True)
        
        for link in links:
            file_url = link['href']
            if file_url.endswith('.xls') or file_url.endswith('.xlsx'): 
                full_file_url = urljoin(url, file_url)
                file_name = os.path.join(download_path, os.path.basename(file_url))
                with open(file_name, 'wb') as f:
                    file_response = requests.get(full_file_url)
                    f.write(file_response.content)
                print(f"File '{file_name}' succesfully downloaded")
    else:
        print("Page error")

And then you loop on the years and call the function like so:

base_url = 'https://clientportal.jse.co.za/downloadable-files?RequestNode=/Safex/agriculture.stats/'


for year in range(2009, 2025):
    url = base_url + str(year)
    download_folder = str(year)
    os.makedirs(download_folder, exist_ok=True)
    download_excel_files(url, download_folder)

Since I am assuming that you're just using the code for this purpose, I'll avoid to explain every passage. If you need something more please ask.

How to download thousands of excel files from website?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in EXCEL

Related Questions in AUTOMATION

Related Questions in DOWNLOAD

Related Questions in CONSOLIDATION

Trending Questions

Popular # Hahtags

Popular Questions