Below are the two web pages having tabs like "Features,Application and Benefits",here I want to extract the content of only "Features" tab. One webpage having "Features" in first tab and other webpage have "Benefits" instead of "Features" tab.
http://www.eaton.com/Eaton/ProductsServices/Hydraulics/Accumulators/PCT_256248 http://www.eaton.com/Eaton/ProductsServices/Vehicle/Superchargers/RSeries/index.htm#tabs-2
Tried Method: By using "below code" and the xpath("//a[span='Features']/../../../div/div") I am able to get content of all tabs which are present in the web page.But,my problem is I am looking for generic "xpath" that should get content of only "Features" in a webpage and it should not display anything if "Features" tab is not present.
HtmlCleaner htmCleaner = new HtmlCleaner();
String s = "http://www.eaton.com/Eaton/ProductsServices/Hydraulics/Accumulators/PCT_256248";
Document doc = Jsoup.connect(s).timeout(30000).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2").get();
String pageContent=doc.toString();
TagNode node = htmCleaner.clean(pageContent);
Object[] statsNode = node.evaluateXPath("//a[span='Features']/../../../div/div");
for(int i=0;i<statsNode.length;i++){
TagNode resultNode = (TagNode) statsNode[i];
System.out.print(resultNode.getText());
}
Notice that the target
div
id
corresponds to thehref
attribute of the tab header. For example, when thehref
attribute value is"#tabs-1"
, the correspondingdiv
id
attribute value is"tabs-1"
.Taking advantage of that correlation, this is one possible XPath that will return
<div>
element that corresponds toFeatures
link/tab or return nothing in absence ofFeatures
tab :