Scraping html generated by javascript with python

55 Views Asked by At

My code:

session = HTMLSession()
r = session.get(url)
result = r.html.find('.YD-Header')

I am able to scrape it like this if class name = "YD-Header"

But I would like to scrape HTML element with class name:

 <td>
 class="Fw(500) Ta(end) Pstart(10px) Miw(60px)"
 </td>

My code doesn't find anything if I use:

 result = r.html.find('.Fw(500)')

How can I find this class (class name consists of blanks and parentheses)?

1

There are 1 best solutions below

1
Booyakasha On

CSS requires that characters like ( and ) are escaped with the \ character, so for classes with those names you end up with selectors like .Pstart(20px).

The issue here is that JavaScript strings also treats \ as the escape character. The JS string '.Pstart(20px)' represents the string .Pstart(20px), which is again not a valid CSS selector as the parenthesis aren't escaped.

The solution is to double escape the classes: the JS string '.Pstart\(20px\).Pstart\(40px\)--md' accurately represents the .Pstart(20px).Pstart(40px)--md CSS selector.