I want to scrape imdb first 100 movies's img , is seems sucessfully ,but it give me wrong url
imdb web site :https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating
url = 'https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
movie_data = soup.findAll('div', attrs={'class': 'lister-item mode-advanced'})
for store in movie_data:
imageDiv = store.find('div', {'class': 'lister-item-image float-left'})
img = imageDiv.a.img['src']
and img always get wrong url
When doing web scraping, you need to look at the HTML to see what it's doing. All of those images load the fake "movie cell" image to start with. That's the
srcattribute in their<img>tag, and that's exactly what you're fetching.The actual movie thumbnail is stored in a
loadlateattribute, which gets fetched by Javascript after the page loads. This allows it to load more quickly, and fill in the images later.So, use this instead: