Is it possible to web scrape a blob URL from a website in python?

2k Views Asked by At

I am trying to extract a CSV file which is stored in a blob URL in this domain using beautiful soup: https://worldpopulationreview.com/country-rankings/exports-by-country

Here's my code:

exports  = pd.read_csv(io.StringIO(requests.get(BeautifulSoup(requests.get('https://worldpopulationreview.com/country-rankings/exports-by-country').text,\
        'html.parser').find_all(download="csvData.csv"))))

What I got was an exception and NO blob link in the href. The blob url does exist when I inspect the html on my browser: and here the exception i received

I decided to just do a get request for the blob url itself instead of scraping it since the href does not show the blob url but this exception appears:

requests.exceptions.InvalidSchema: No connection adapters were found for 'blob:https://worldpopulationreview.com/850ac28e-9cd9-46b6-9423-e96a0bd7e938'

Is there a way to web scrape blob URLs?

1

There are 1 best solutions below

1
cuzi On BEST ANSWER

These blob URLs are created only in the browser, usually with Javascript, they don't exist on the server at all. So you cannot download them with requests.

You could use a Javascript script in the browser console to get the content, here is an example on how to fetch the blob URL in Javascript: https://stackoverflow.com/a/52410044/

If you need to do this automatically, you can possibly create a userscript to do it or use an automation tool like AutoHotkey to click th download link automatically.