Once I set my variable URL for beautifulsoup to scrape, can I update that variable URL in a loop so it can run without me manually inputting the new URL in beautifulsoup package in python?
I tried making a loop so it would pull the new URL out of the original URL and my update original URL with the new URL that i pulled out of the website. I was able to get it running without any errors. My problem is that it wont update the original URL with the new URL that was scraped from the website. Hoping to find some beautifulSoup experts :)
Heres my code:
url = "https://www.academy.com/c/academy-clearance &facet=%27facet_Product%20Type%27:%27Shoes%27"
while url:
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
# Extract necessary links
links = set()
for link in soup.find_all('a'):
href = link.get('href')
if href and href.startswith("/p/"):
full_link = urllib.parse.urljoin(base_url, href) # Construct absolute URL
links.add(full_link)
# Output the links
for link in links:
print(link)
# Find the link to the next page
next_page_element = soup.find('a', {'data-auid': 'gotoNextPage'})
if next_page_element:
next_page_link = next_page_element.get('href')
next_page_url = urllib.parse.urljoin(base_url, next_page_link) # Construct absolute URL
print("Next Page Link:", next_page_url)
url = next_page_url
else:
print("No next page link found.")
break
So my goal is to scrape a website and extract product details, and at the end of the code im looking for the next page link. I want to be able to update my original URL variable with the next page link.