I have tried several answers in Stack Overflow. When I print the webpage I can only see the equivalent of viewing the page source in Chrome, rather than the full DOM tree you would get from inspecting the web page. As you can see I have put a wait in but this hasn't changed anything, should I try Firefox instead of Chrome?
Is it possible the website I'm trying to use has anti-scraping measures? What else could I try?
def selenium_start(url):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=options)
driver.get(url)
try:
driver = WebDriverWait(driver, 5).until\
(EC.presence_of_element_located((By.ID, "koex")))
except:
print('Sorry!')
return driver
webpage_driver = selenium_start('https://getbootstrap.com/docs/4.0/components/collapse/')
"""
div_container = webpage_driver.find_element(By.CLASS_NAME, 'maincontent')
html = webpage_driver.execute_script('return document.documentElement.outerHTML')
#inner_div = div_container.get_attribute('outerHTML')
"""
print(page_soup)
It's hard to tell from context but if you have a string containing the page html source then parsing it with Beautiful Soup will do. Maybe not ideal if you need to keep the number of dependencies as small as possible but thats an easy fix.