Extracting <script> from HTML with BeautifulSoup

801 Views Asked by plshelp At 18 July 2017 at 18:50

I'm searching through an html file with BeautifulSoup's find_all function. I'm having a couple problems with this. First, since I want to find only the <script> tags, I have to use soup.find_all('script') since it won't let me have the <,> in the find_all(). Is there a way to get around this? Just by searching script I'm getting parts of the HTML file that are not a script tag but parts that use the word script in a URL or paragraph.

Second, when I use soup.find_all('script'), there are certain HTML files where not all script tags are returned. In some files, these are <script>'s in the <head> of the file and other's, the page parameters are dealt with in the scripts. Is there a way to get around this and force all script tags to be returned?

For example, one of the ignored <script>'s look like this:

<!--[if lte IE 7]>
<script src="//www.webiste.com" type="text/javascript" ></script>
<![endif]-->

My code is:

from bs4 import BeautifulSoup


soup = BeautifulSoup(open(file), 'html.parser')
tags = soup.find_all('script')

I'm trying to grab every <script>...</script> section out of the HTML file. This has been the easiest way I've found to do it, but if anyone knows of an easier way that will also fix my other problems I'm open to changing my code.

Original Q&A

Extracting <script> from HTML with BeautifulSoup

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in HTML

Related Questions in BEAUTIFULSOUP

Related Questions in CONDITIONAL-COMMENTS

Trending Questions

Popular # Hahtags

Popular Questions