Extracting data from JavaScript (Python Scraper)

740 Views Asked by skeggse At 27 July 2025 at 09:26

I'm currently using a fusion of urllib2, pyquery, and json to scrape a site, and now I find that I need to extract some data from JavaScript. One thought would be to use a JavaScript engine (like V8), but that seems like overkill for what I need. I would use regular expressions, but the expression for this seems way to complex.

JavaScript:

(function(){DOM.appendContent(this, HTML("<html>"));;})

I need to extract the <html>, but I'm not entirely sure how to do so. The <html> itself can contain basically every character under the sun, so [^"] won't work.

Any thoughts?

Original Q&A

There are 2 best solutions below

edanfalls On 28 January 2011 at 09:17 BEST ANSWER

Why regex? Can't you just use two substrings as you know how many characters you want to trim off the beginning and end?

string[42:-7]

As well as being quicker than a regex, it then doesn't matter if quotes inside <html> are escaped or not.

Jens On 28 January 2011 at 07:38

If every occurance of " inside the html code would be escaped by using \" (it is a JavaScript string after all), you could use

HTML\("((?:\\"|.)*?)"\)

to get the parameter to HTML into the first capturing group.

Note that this Regex is not yet escaped to be a Javascript String itself.

Extracting data from JavaScript (Python Scraper)

There are 2 best solutions below

Related Questions in JAVASCRIPT

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in WEB-SCRAPING

Related Questions in SCRAPER

Trending Questions

Popular # Hahtags

Popular Questions