I want to code a Download Manager in Python like JDownloader that downloads easy files for you. But not every file has a download url in the document. How can I get download url's if the files are like "invisible" in the document ? I found on the internet, that network sniffing is maybe working, but it doesn't seem to be the right thing I need. JDownloader is just checking for a second and directly finds what you need. How does this work ? For example: https://speed.hetzner.de/
I am a beginner btw.
Looking at your example page, it has 3 hrefs that points to a file. When you look at a href, sometime you can tell it is a file based on the extension. But, in a normal scenario websites can do some serverside processing and then return a file. Sometimes the URLs are not even files, they are pointing to some other page.
So, you have two things to do.
To perform the second part, you can use python requests library to get the content type. Here is a small example:
If your look at the
response.headershere you can see the 'Content-type' which is set to'application/octet-stream'. This field should be used to filter out files. There are other content types that you have to look for, in order to decide if it is a downloadable or not. Once you have this filtered list, it is the list of downloadable files on this webpage.Notice that I am using
requests.headto get the content type. Use HEAD request to get some meta information about a URL. If you do a GET/POST, it might timeout.