PyQuery won't return elements on a page

299 Views Asked by At

I've set up a Python script to open this web page with PyQuery.

import requests
from pyquery import PyQuery

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)
pqPage = PyQuery(page.content)

But pqPage("li") returns only a blank list, []. Meanwhile, pqPage.text() shows the text of the page's HTML, which includes li elements.

Why won't the code return a list of li elements? How do I make it do that?

1

There are 1 best solutions below

2
furas On BEST ANSWER

In seems PyQuery has problem to work with this page - maybe because it is xhtml page. Or maybe because it use namespace xmlns="http://www.w3.org/1999/xhtml"

When I use

pqPage.css('li')

then I get

[<{http://www.w3.org/1999/xhtml}html#sfFrontendHtml>]

which shows {http://www.w3.org/1999/xhtml} in element - it is namespace. Some modules has problem with HTML which uses namespaces.


I have no problem to get it using Beautifulsoup

import requests
from bs4 import BeautifulSoup as BS

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)

soup = BS(page.text, 'html.parser')
for item in soup.find_all('li'):
    print(item.text)

EDIT: after digging in Google I found that using parser="html" in PyQuery() I can get li.

import requests
from pyquery import PyQuery

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)

pqPage = PyQuery(page.text, parser="html")
for item in pqPage('li p'):
    print(item.text)