I want to extract only the sales rank (which in this case is 5)
Amazon Best Sellers Rank: #5 in Books ( See Top 100 in Books )
From web page : http://www.amazon.com/Mockingjay-Hunger-Games-Book-3/dp/0439023513/ref=tmm_hrd_title_0
So far I have gotten down to this, which selects "Amazon Best Sellers Rank:":
//li[@id='SalesRank']/b/text()
I am using PHP DOMDocument
and DOMXPath
.
You can use pure XPath:
However, if your input is a bit messy you might get more reliable results by using XPath to grab the parent node's text, and then using a regex on the text to get the specific thing you want.
Demonstration of both methods using PHP with
DOMDocument
andDOMXPath
:The output I get is: