How to remove specific tags from HTML extracted via Python selenium using webdriver.pageSource()

169 Views Asked by At

I want to write a script that automatically goes through a couple of websites (around 50), extracts their HTML and notifies me if there is any update. I am interested mainly in knowing about any new data upload. Since, it is often PDF upload,it is a little complicated. Even when there are no changes in the website, it notifies me of an update, due to javascripts included for GoogleAds, Analytics etc. This will be like a false alarm for me. I want my script to not consider these and do not alert me for these changes. I tried working with requests also for this script, but since most sites have javascript, I feel its better to go with selenium. Any inputs on best way to approach this will be halpful. Thanks

I was trying to use driver.find_elements(By.TAG_NAME, 'script').remove to remove script tags but it doesn't seem to do anything to the result

1

There are 1 best solutions below

0
On

You can remove them with:

driver.execute_script("""
  for(let script of document.querySelectorAll('script')) script.remove()
""")

This will remove them from DOM but it won't affect the javascript on the page.