python scrape imdb's image

714 Views Asked by TangPing At 02 January 2022 at 00:59

I want to scrape imdb first 100 movies's img , is seems sucessfully ,but it give me wrong url

imdb web site :https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating

url = 'https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating'

response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')

movie_data = soup.findAll('div', attrs={'class': 'lister-item mode-advanced'})

for store in movie_data:
    imageDiv = store.find('div', {'class': 'lister-item-image float-left'})
    img = imageDiv.a.img['src']

and img always get wrong url

Original Q&A

There are 1 best solutions below

Tim Roberts On 02 January 2022 at 01:05

When doing web scraping, you need to look at the HTML to see what it's doing. All of those images load the fake "movie cell" image to start with. That's the src attribute in their <img> tag, and that's exactly what you're fetching.

The actual movie thumbnail is stored in a loadlate attribute, which gets fetched by Javascript after the page loads. This allows it to load more quickly, and fill in the images later.

So, use this instead:

    img = imageDiv.a.img('loadlate')

python scrape imdb's image

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in IMDB

Related Questions in IMDBPY

Trending Questions

Popular # Hahtags

Popular Questions