Scraping a webpage with Python

237 Views Asked by Bhav At 30 June 2025 at 02:48

I'm trying to learn to scrape webpage (http://www.expressobeans.com/public/detail.php/185246), however I don't know what I'm doing wrong. I think it's to do with identifing the xpath but how do I get the correct path (if that is the issue)? I've tried Firebug in Firefox as well as the Developer Tools in Chrome.

I want to be able to scrape the Manufacturer value (D&L Screenprinting) as well as all the Edition Details.

python script:

from lxml import html
import requests

page = requests.get('http://www.expressobeans.com/public/detail.php/185246')

tree = html.fromstring(page.text)

buyers = tree.xpath('//*[@id="content"]/table/tbody/tr[2]/td/table/tbody/tr/td[1]/dl/dd[3]')

print buyers

returns:

[]

Original Q&A

There are 2 best solutions below

eugenioy On 10 June 2015 at 21:13

I'd start by suggesting you look at the page HTML and try to find a node closer to the value you are looking for, and build your path from there to make it shorter and easier to follow.

In that page I can see that there is a "dl" with class "itemListingInfo" and under that one all the information you are looking for.

Also, if you want the "D&L Screenprinting" text, you need to extract the text from the link.

Try with this modified version, it should be straightforward to add the other xpath expressions and get the other fields as well.

from lxml import html
import requests

page = requests.get('http://www.expressobeans.com/public/detail.php/185246')

tree = html.fromstring(page.text)

buyers = tree.xpath('//dl[@class="itemListingInfo"]/dd[2]/a/text()')

print buyers

One On 10 June 2015 at 20:55

remove tbody from the xpath

buyers = tree.xpath('//*[@id="content"]/table/tr[2]/td/table/tr/td[1]/dl/dd[3]')

Scraping a webpage with Python

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in XPATH

Related Questions in WEB-SCRAPING

Trending Questions

Popular # Hahtags

Popular Questions