Is there a way how to extract data from response.content in python?

772 Views Asked by At

I'm trying to figure out how to scrape/extract a image url out of response.content.

This is the url I'm trying to extract <img src="/Content/images/asos-logo-2022-93x28.png"

The problem is that everything after the /Content/images/ part can change...

Any help appreciated !!!

1

There are 1 best solutions below

0
On BEST ANSWER

You can use Beautiful Soup for this:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get("https://stackoverflow.com/q/71636643/1416672")
>>> html = r.text
>>> soup = BeautifulSoup(html, 'html.parser')
>>> for item in soup.find_all('img'): print(item['src'])
... 
https://cdn.sstatic.net/Img/teams/teams-illo-free-sidebar-promo.svg?v=47faa659a05e
https://www.gravatar.com/avatar/f96b33e2715bf57ba8e434140f0aeeba?s=64&d=identicon&r=PG&f=1
/posts/71636643/ivc/9bb6
https://sb.scorecardresearch.com/p?c1=2&c2=17440561&cv=3.6.0&cj=1

If you want to match a specific image then check the docs how to search by CSS class or any other CSS selectors.