I'm fairly new to the web scraping world but I really need to do some web scraping on the Thesaurus website for a project I'm working on. I have successfully created a program using beautifulsoup4 that asks the user for a word, then returns the most likely synonyms based on Thesaurus. However, I would like to not only have those synonyms but also the synonyms of every sense of the word (which is depicted on Thesaurus by a list of buttons above the synonyms). I noticed that when clicking a button, the name of the classes also change, so I did a little digging and decided to go with Selenium instead of beautifulsoup. I have now a code that writes a word on the search bar and clicks it, however, I'm unable to get the synonyms or the said buttons, simply because the find_element finds nothing, and being new to this, I'm afraid I'm using the wrong syntax.
This is my code at the moment (it looks for synonyms of "good"):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
PATH = "C:\Program Files (x86)\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://thesaurus.com")
search = driver.find_element_by_id("searchbar_input")
search.send_keys('good')
search.send_keys(Keys.RETURN)
try:
headword = WebDriverWait(driver,10).until(
EC.presence_of_element_located((By.ID, "headword"))
)
print(headword.text)
#buttons = headword.find_element_by_class_name("css-bjn8wh e1br8a1p0")
#print(buttons.text)
meanings = WebDriverWait(driver,10).until(
EC.presence_of_element_located((By.ID, "meanings"))
)
print(meanings.text)
#words = meanings.find_elements_by_class_name("css-1kg1yv8 eh475bn0")
#print(words.text)
except:
print('failed')
driver.quit()
For the first part, I want to access the buttons. The headword is simply the element that contains all the buttons I want to press. This is the headword element according to the inspect tool:
<div id="headword" class="css-bjn8wh e1br8a1p0">
<div class="css-vw3jp5 e1ibdjtj4">
*unecessary stuff*
<div class="css-bjn8wh e1br8a1p0">
<div class="postab-container css-cthfds ew5makj3">
<ul class="css-gap396 ew5makj2">
<li data-test-pos-tab="true" class="active-postab css-kgfkmr ew5makj4">
<a class="css-sc11zf ew5makj1">
<em class="css-1v93s5a ew5makj0">adj.</em>
<strong>pleasant, fine</strong>
</a>
</li>
<li data-test-pos-tab="true" class=" css-1ha4k0a ew5makj4">
*similar stuff*
<li data-test-pos-tab="true" class=" css-1ha4k0a ew5makj4">
...
where each one these <li data-test-pos-tab="true" class=" css-1ha4k0a ew5makj4">
is a button I want to click. So far I have tried a bunch of things like the one showed in the code, and also things like:
buttons = headword.find_elements_by_class_name("css-1ha4k0a ew5makj4")
buttons = headword.find_elements_by_css_selector("css-1ha4k0a ew5makj4")
buttons = headword.find_elements_by_class_name("postab-container css-cthfds ew5makj3")
buttons = headword.find_elements_by_css_selector("postab-container css-cthfds ew5makj3")
but in any case Selenium can find these elements.
For the second part I want the synonyms. Here is the meaning element:
<div id="meanings" class="css-16lv1yi e1qo4u831">
<div class="css-1f3egm3 efhksxz0">
*unecessary stuff*
<div data-testid="word-grid-container" class="css-ixatld e1cc71bi0">
<ul class="css-1ngwve3 e1ccqdb60">
<li>
<a font-weight="inherit" href="/browse/acceptable" data-linkid="nn1ov4" class="css-1kg1yv8 eh475bn0">
</a>
</li>
<li>
<a font-weight="inherit" href="/browse/bad" data-linkid="nn1ov4" class="css-1kg1yv8 eh475bn0">
...
where each of these elements is a synonym I want to get. Similarly to the previous case I tried several things such as:
synGrid = meanings.find_element_by_class_name("css-ixatld e1cc71bi0")
synGrid = meanings.find_element_by_css_selector("css-ixatld e1cc71bi0")
words = meanings.find_elements_by_class_name("css-1kg1yv8 eh475bn0")
words = meanings.find_elements_by_css_selector("css-1kg1yv8 eh475bn0")
And again Selenium cannot find these elements... I would really appreciate some help in order to achieve this, even if it is just a push in the right direction instead of giving a full solution. Hope I wrote all the needed information, if not, please let me know.
If you use
css selector
then you have to usedot
forclass
and
hash
forid
like you would use in files
.css
In
css selector
you can use also other methods avaliable inCSS
.See css selectors on
w3schools.com
Selenium converts
class_name
tocss selector
butclass_name()
expects single name and Selenium has problems when there are two or more names. When it convertsclass_name
tocss_selector
then it addsdot
only before first name but it needsdot
also before second and other names. So you have to manually add seconddot