Using Regular Expression with Twill

259 Views Asked by At

I'm currently using urllib2 and BeautifulSoup to open and parse html data. However I've ran into a problem with a site that uses javascript to load the images after the page has been rendered (I'm trying to find the image source for a certain image on the page).

I'm thinking Twill could be a solution, and am trying to open the page and use a regular expression with 'find' to return the html string I'm looking for. I'm having some trouble getting this to work though, and can't seem to find any documentation or examples on how to use regular expressions with Twill.

Any help or advice on how to do this or solve this problem in general would be much appreciated.

2

There are 2 best solutions below

0
starenka On

I'd rather user CSS selectors or "real" regexps on page source. Twill is AFAIK not being worked on. Have you tried BS or PyQuery with CSS selectors?

0
amadain On

Twill does not work with javascript (see http://twill.idyll.org/browsing.html)

use webdriver if you want to handle javascript