When I run this code to get the titles and links, I get 10X results. Any idea what I am doing wrong? Is there a way to stop the scraping when we reach the last result on the page?
Thanks!
while True:
web = 'https://news.google.com/search?q=weather&hl=en-US&gl=US&ceid=US%3Aen'
driver.get(web)
time.sleep(3)
titleContainers = driver.find_elements(by='xpath', value='//*[@class="DY5T1d RZIKme"]')
linkContainers = driver.find_elements(by='xpath', value='//*[@class="DY5T1d RZIKme"]')
if (len(titleContainers) != 0):
for i in range(len(titleContainers)):
counter = counter + 1
print("Counter: " + str(counter))
titles.append(titleContainers[i].text)
links.append(linkContainers[i].get_attribute("href"))
else:
break
You put yourself in an infinite loop, with that 'while True' statement.
if (len(titleContainers) != 0):condition will always evaluate to True, once they're found in page (they're 100). You're not posting your full code as well, I imagine thatcounter,titlesandlinksare lists defined somewhere in your code. You may want to test for counter to be less or equal totitleContainerslength.