I am unable to scrape all the urls of the file newsletter. I only scrape first page urls. this is link to the website. https://news.matdesousa.com/
from bs4 import BeautifulSoup import requests
url = "https://news.matdesousa.com/" headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
Make a request to the URL to get the HTML content
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
Find inner divs within the specified class
inner_divs = soup.find_all('div', class_='group h-full overflow-hidden transition-all shadow-none hover:shadow-none rounded-lg')
Extract links from each inner div
for inner_div in inner_divs: link = inner_div.find('a')['href'] print(link)
For this specific question you will need to have control over the browser. Selenium or puppeteer are good for automating tasks a human would do in the browser, such as moving your mouse or clicking/scrolling things inside of the browser. You will need to click the Load More button. I have found it by using something called "xpath" you should use this as it is more reliable when scraping. This code below should work for your implementation.