How to get the intersection of two xpath node sets

637 Views Asked by At

To get a union of two different node sets I can do the following using the | separator:

node.xpath(
    '(//C:Year[not(@value="2019")]) | (//R:Product[@value="Phone"])'
    , namespaces={'C': 'Columns', 'R': 'Rows'})

Is there a way to get an intersection between the two without knowing the relationship between those two paths (i.e., allowing them to be ordered any way). I tried the following:

node.xpath('(//C:Year[not(@value="2019")]) and (//R:Product[@value="Phone"])', namespaces={'C': 'Columns', 'R': 'Rows'})

But the and seems to return a bool instead of a node set. What would be the proper way to do this?

I'm not sure a good place to share xml/xpath expressions but you can go here https://extendsclass.com/xpath-tester.html and copy-paste in the following xpath and xml and it should work fine:

Expression: //C:Year[not(@value="2019")] | //R:Product[@value="Phone"]
XML:        <Data xmlns:R="Rows" xmlns:C="Columns" xmlns:V="Values"><R:ProductGroup value="Electronics"><R:Product value="Computer"><C:Year value="2018"><V:SumOfRevenue value="104"/><V:SumOfUnits   value="3"/></C:Year><C:Year value="2019"><V:SumOfRevenue value="82"/><V:SumOfUnits   value="9"/></C:Year><C:Year value="(all)"><V:SumOfRevenue value="186"/><V:SumOfUnits   value="12"/></C:Year></R:Product><R:Product value="Phone"><C:Year value="2018"><V:SumOfRevenue value="102"/><V:SumOfUnits   value="4"/></C:Year><C:Year value="2019"><V:SumOfRevenue value="99"/><V:SumOfUnits   value="12"/></C:Year><C:Year value="(all)"><V:SumOfRevenue value="201"/><V:SumOfUnits   value="16"/></C:Year></R:Product><R:Product value="(all)"><C:Year value="2018"><V:SumOfRevenue value="206"/><V:SumOfUnits   value="7"/></C:Year><C:Year value="2019"><V:SumOfRevenue value="181"/><V:SumOfUnits   value="21"/></C:Year><C:Year value="(all)"><V:SumOfRevenue value="387"/><V:SumOfUnits   value="28"/></C:Year></R:Product></R:ProductGroup><R:ProductGroup value="Media"><R:Product value="Movies"><C:Year value="2018"><V:SumOfRevenue value="25"/><V:SumOfUnits   value="12"/></C:Year><C:Year value="2019"><V:SumOfRevenue value="26"/><V:SumOfUnits   value="13"/></C:Year><C:Year value="(all)"><V:SumOfRevenue value="51"/><V:SumOfUnits   value="25"/></C:Year></R:Product><R:Product value="Theater"><C:Year value="2018"><V:SumOfRevenue value="17"/><V:SumOfUnits   value="3"/></C:Year><C:Year value="2019"><V:SumOfRevenue value="20"/><V:SumOfUnits   value="6"/></C:Year><C:Year value="(all)"><V:SumOfRevenue value="37"/><V:SumOfUnits   value="9"/></C:Year></R:Product><R:Product value="(all)"><C:Year value="2018"><V:SumOfRevenue value="42"/><V:SumOfUnits   value="15"/></C:Year><C:Year value="2019"><V:SumOfRevenue value="46"/><V:SumOfUnits   value="19"/></C:Year><C:Year value="(all)"><V:SumOfRevenue value="88"/><V:SumOfUnits   value="34"/></C:Year></R:Product></R:ProductGroup><R:ProductGroup value="(all)"><R:Product value="(all)"><C:Year value="2018"><V:SumOfRevenue value="248"/><V:SumOfUnits   value="22"/></C:Year><C:Year value="2019"><V:SumOfRevenue value="227"/><V:SumOfUnits   value="40"/></C:Year><C:Year value="(all)"><V:SumOfRevenue value="475"/><V:SumOfUnits   value="62"/></C:Year></R:Product></R:ProductGroup></Data>

One possible solution is to 'go back to the root' for each intersection by using ancestor::RootName, so we would have:

//C:Year[not(@value="2019")]/ancestor::Data//R:Product[@value="Phone"]

Is there another way to do this?

1

There are 1 best solutions below

2
On BEST ANSWER

In XPath 2.0, use the intersect operator.

There's no simple way of doing it in XPath 1.0

I'm wondering though whether you really want the intersection. The intersection of a set of C:Year elements with a set off R:Product elements is going to be empty (no element can be a member of both sets -- it can be a C:Year or an R:Product but not both.).

So I suspect that what you want isn't actually the set intersection, but something else. But I can't work out what you want from your question.