Python, Selenium web scrapping error with xpath: invalid selector,... is not a valid XPath expression, ... 'evaluate' on 'Document'

Question

Python, Selenium web scrapping error with xpath: invalid selector,... is not a valid XPath expression, ... 'evaluate' on 'Document'

211 Views Asked by VitorWAW At 23 March 2023 at 18:58

I'm doing a tutorial and the task is to download pictures from "Google Images", using Python and Selenium but I have some problems.

import bs4
import requests
from selenium import webdriver
import os
import time

chromeDriverPath=r'C:\Users\Aorus\Downloads\Z_ARCHIWUM\PythonScript\chromedriver_win32\chromedriver.exe'
driver=webdriver.Chrome(chromeDriverPath)

search_URL = 'https://www.google.com/search?q=budynki&rlz=1C1GCEU_plPL919PL919&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRyJvoo_L9AhWJxIsKHTIKDqwQ_AUoAXoECAEQAw&biw=1553&bih=724'

driver.get(search_URL)

a = input('Waiting for user input to start...')

# Scrolling all the way up
driver.execute_script('window.scrollTo(0, 0);')

page_html = driver.page_source
pageSoup = bs4.BeautifulSoup(page_html, 'html.parser')
containers = pageSoup.findAll('div', {'class':'isv-r PNCib MSM1fd BUooTd'})

len_containers = len(containers)
print('Found %s image containers'%(len_containers))

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'


for i in range(1, len_containers+1):
    if i % 25 == 0:
        continue
    
    xPath2 = xPath1 + str(i)
    driver.find_element("xpath", xPath2).click()

and I got this error:

InvalidSelectorException: invalid selector: Unable to locate an element with the xpath expression //*[@id="islrg"]/div[1]/div[13]1 because of the following error:

SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

I chose a bad DIV or somewhere I should add str() or .text or the XPath is bad? When I choose a single picture to use .click(), it works.

Original Q&A

There are 2 best solutions below

undetected Selenium On 23 March 2023 at 19:47

This error message...

InvalidSelectorException: invalid selector: Unable to locate an element with the xpath expression //*[@id="islrg"]/div[1]/div[13]1 because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

...implies that the locator strategy you have used is not a valid xpath expression.

This usecase

The block of code you have used:

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'
for i in range(1, len_containers+1):
    if i % 25 == 0:
    continue
    xPath2 = xPath1 + str(i)
    driver.find_element("xpath", xPath2).click()

effectively results into xPath2 being evaluted as:

//*[@id="islrg"]/div[1]/div[13]1

which isn't a a valid xpath expression.

Solution

To convert xPath2 into a valid xpath your modified line of code will be:

xPath1 = '(//*[@id="islrg"]/div[1]/div[13])'
for i in range(1, len_containers+1):
    if i % 25 == 0:
    continue
    xPath2 = xPath1 + '(' +str(i)+ ')'
    driver.find_element("xpath", xPath2).click()

**JeffC** · Accepted Answer · 2023-03-24T06:16:30.543000

The error message shows exactly what went wrong.

The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

You took an XPath

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'

and then appended '1' to it in the line below (because i is 1)

xPath2 = xPath1 + str(i)

which becomes

'//*[@id="islrg"]/div[1]/div[13]' + '1'
'//*[@id="islrg"]/div[1]/div[13]1'

which is the exact string from the error message. The problem is that this is not a valid XPath... the final '1' at the end of the string makes it invalid.

After reviewing your entire script, I think there's a simpler way to approach this. Right now you've got BeautifulSoup in your script but it's not needed... you can get all of this using Selenium alone, simplifying everything.

One issue I ran into while writing this script is that the images take a moment to load. We can't use a standard WebDriverWait here because we don't know how many images are going to appear. So, we write a method that polls the page every 100ms to see if the count of images has gone up. We keep looping until the count is stable, meaning all the images have loaded.

def wait_for_images(locator)
    count = 0
    images = driver.find_elements(*locator)
    while len(images) != count:
        count = len(images)
        time.sleep(.1)
        images = driver.find_elements(*locator)

    return images

Now that we have the helper method, we can write the main script

chromeDriverPath = r'C:\Users\Aorus\Downloads\Z_ARCHIWUM\PythonScript\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(chromeDriverPath)

search_URL = 'https://www.google.com/search?q=budynki&rlz=1C1GCEU_plPL919PL919&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRyJvoo_L9AhWJxIsKHTIKDqwQ_AUoAXoECAEQAw&biw=1553&bih=724'
driver.get(search_URL)

a = input('Waiting for user input to start...')

# Scrolling all the way up
driver.execute_script('window.scrollTo(0, 0);')

for image in wait_for_images((By.CSS_SELECTOR, ".bRMDJf.islir > img[src]")):
    print(image.get_attribute("src"))

This prints the URLs of each image that you can navigate to separately and download or whatever you need to do with them.

Python, Selenium web scrapping error with xpath: invalid selector,... is not a valid XPath expression, ... 'evaluate' on 'Document'

There are 2 best solutions below

This usecase

Solution

Related Questions in PYTHON-3.X

Related Questions in SELENIUM-WEBDRIVER

Related Questions in WEB-SCRAPING

Related Questions in XPATH

Related Questions in GOOGLE-IMAGE-SEARCH

Trending Questions

Popular # Hahtags

Popular Questions