Selecting nodes with conditions

142 Views Asked by At

I am trying to transform XML files, but I am blocked. The idea is to aggregate every elements from <ore:Aggregation> node until the next one. It is some kind of itemization. But I can't get more than 1 edm:WebResource per created dc:item.

XML :

<rdf:RDF>
    <ore:Aggregation rdf:about="id1">
        <some:crap/>
    </ore:Aggregation>
    <edm:ProvidedCHO rdf:about="id1">
        <some:crap/>
    </edm:ProvidedCHO>
    <edm:WebResource rdf:about="some/random/url"> 
        <some:crap/>
    </edm:WebResource>
            ...
               (n 'edm:WebResource' nodes)
            ...
    <edm:WebResource rdf:about="some/random/url">
        <some:crap/>    
    </edm:WebResource>

    <ore:Aggregation rdf:about="id2">
        <some:crap/>
    </ore:Aggregation>
    <edm:ProvidedCHO rdf:about="id2">
        <some:crap/>
    </edm:ProvidedCHO>
    <edm:WebResource rdf:about="some/random/url"> 
        <some:crap/>
    </edm:WebResource>
            ...
               (n 'edm:WebResource' nodes)
            ...
    <edm:WebResource rdf:about="some/random/url">
        <some:crap/>    
    </edm:WebResource>

        ... and on and on ...
</rdf:RDF>

XSL

<xsl:template match="/">
    <xsl:apply-templates select="/rdf:RDF/ore:Aggregation"/>
</xsl:template>

<xsl:template match="/rdf:RDF/ore:Aggregation">
    <rdf:RDF>
    <xsl:for-each select=".">
            <dc:item>
                <xsl:attribute name="rdf:about">
                    <xsl:value-of select="concat($fileName, '_item', position())"/>
                </xsl:attribute>

                <xsl:copy-of select="."/>
                <xsl:copy-of select="following-sibling::edm:ProvidedCHO[1]"/>
                <xsl:copy-of select="following-sibling::edm:WebResource[1]"/>

                <!-- WHERE IT SUCKS -->
                <xsl:if test="local-name(following-sibling::*[3]) = 'edm:WebResource'">
                    <xsl:copy-of select="following-sibling::*[3]"/>
                </xsl:if>                    
                <!-- ./WHERE IT SUCKS -->


            </dc:item>
    </xsl:for-each>
    </rdf:RDF>
</xsl:template>

Another attempt which bring too many nodes :

<!-- WHERE IT SUCKS -->
<xsl:copy-of select="following-sibling::*[local-name (preceding::*[1]) = 'ore:Aggregation']"/>
<!-- ./WHERE IT SUCKS -->

Expected Output

<!-- ITEM N1 -->
<rdf:RDF>
    <dc:item rdf:about="some.concat.string"/>
    <ore:Aggregation rdf:about="id1">
        <some:crap/>
    </ore:Aggregation>
    <edm:ProvidedCHO rdf:about="id1">
        <some:crap/>
    </edm:ProvidedCHO>
    <edm:WebResource rdf:about="some/random/url"> 
        <some:crap/>
    </edm:WebResource>
</rdf:RDF>

<!-- ITEM N2 -->
<rdf:RDF>
     <dc:item rdf:about="some.concat.string"/>
     <ore:Aggregation rdf:about="id1">
     <etc/>
2

There are 2 best solutions below

3
On BEST ANSWER

In XSLT 2.0, this looks like it would be a job for xsl:for-each-group (See http://www.xml.com/pub/a/2003/11/05/tr.html). In particular, using it with group-starting-with

 <xsl:for-each-group select="*" group-starting-with="ore:Aggregation">

This would be done when positioned on the parent rdf:RDF element, and will arrange all the child elements into groups, with the ore:Aggregration being the start of each group. The code within xsl:for-each-group then gets called once for each ore:Aggregation element, and you can then use the current-group() function to access all elements within the group.

Try this XSLT for starters

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:rdf="rdf" xmlns:ore="ore" xmlns:dc="dc">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <xsl:template match="rdf:RDF">
        <xsl:for-each-group select="*" group-starting-with="ore:Aggregation">
            <rdf:RDF xmlns:edm="edm" xmlns:ore="ore" xmlns:some="some">
                <dc:item rdf:about="{concat('item', position())}" />
                <xsl:apply-templates select="current-group()" />
            </rdf:RDF>
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet> 

Note that the output XML this generates is not well-formed, as it lacks a single root element. It would be much better if one were added, not just to make it well formed, but then the namespace declarations would also go in one place:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:rdf="rdf" xmlns:ore="ore" xmlns:dc="dc">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <xsl:template match="rdf:RDF">
        <rdf:root xmlns:edm="edm" xmlns:ore="ore" xmlns:some="some">
            <xsl:for-each-group select="*" group-starting-with="ore:Aggregation">
                <rdf:RDF>
                    <dc:item rdf:about="{concat('item', position())}" />
                    <xsl:apply-templates select="current-group()" />
                </rdf:RDF>
            </xsl:for-each-group>
        </rdf:root>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet> 

Also note the use of Attribute Value Templates in creating the rdf:about which further reduces the amount of code needed.

2
On

I've run into this case a few times before and the best solution that I've been able to come up with involves set manipulation.

In a way this makes sense, since you want all of the nodes after each ore:Aggregation until the next ore:Aggregation.

Put another way, you want all of the following nodes except for next ore:Aggregation and everything that follows it.

Thankfully in XSLT 2.0 we have set manipulation operators, so we don't have to jump through convoluted hoops that we would have had to do in XSLT 1.0.

Try this XPATH

following-sibling::node() except 
  (following-sibling::ore:Aggregation | 
   following-sibling::ore:Aggregation/following-sibling::node())

This should give you the node-set that you expect.

This pattern works generally every time you are attempting to sort a flat list of elements into some sort of result structure. For example, the problem I had was finding all elements that had a particular attribute and then grouping them into one.

So, the general solution is (in pseudo-code)

This would be a little simpler if we could specify both the terminal tag and it's following siblings at the same time, but this doesn't actually read too bad.

following-sibling::node() except 
  (following-sibling::{next-node-selection-criteria} | 
   following-sibling::{next-node-selection-criteria}/following-sibling::node())