I have an xml-document with text and one with a list of words. I want to search the words from the list in the text and enclose them in a new tag while leaving everything else as is. In short, the XSLT should manage three things:
- Preserve all existings elements and attributes including inline elements
- Identify all words that appear in an external list of words
- Replace these words with an element (which ideally references the id included in the external list of words)
I managed to do some of these things, however I have trouble with bringing it all together and creating the desired output.
The input:
<doc>
<header>Document with example sentences</header>
<text>
<div type="sentence" n="1">They<note>buyers</note> bought an apple and a banana.</div>
<div type="sentence" n="2">They<note>shop</note> only had a strawberry and an apple left.</div>
</text>
</doc>
The list:
<list>
<fruit id="001">
<english>apple</english>
<translations>Apfel, pomme</translations>
</fruit>
<fruit id="002">
<english>banana</english>
<translations>Banane, banane</translations>
</fruit>
<fruit id="003">
<english>strawberry</english>
<translations>Erdbeere, strawberry</translations>
</fruit>
</list>
The desired output:
<doc>
<header>Document with example sentences</header>
<text>
<div type="sentence" n="1">They<note>buyers</note> bought <fruit ref="#001">apple</fruit> and a <fruit ref="#002">banana</fruit>.</div>
<div type="sentence" n="2"> They<note>shop</note> only had <fruit ref="#003">strawberries</fruit> and <fruit ref="#001">apples</fruit>left.</div>
</text>
</doc>
I've tried two things so far. The first manages to identify the words from the list in the text, the second one manages to replace words with templates. I can't figure out how to do both at the same time AND preserve all other elements in the document.
Identifying words from list in text
<xsl:template match="/ | @*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:variable name="list" select="document('liste.xml')"/>
<xsl:template match="div">
<xsl:variable name="text" select="."/>
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<xsl:for-each select="$list/list/fruit">
<xsl:variable name="english" select="english"/>
<xsl:if test="contains($text,$english)">
<xsl:element name="identified_fruit">
<xsl:value-of select="$english"/>
</xsl:element>
</xsl:if>
</xsl:for-each>
</xsl:copy>
</xsl:template>
Replacing words with elements:
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@*, node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="div">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="text()" mode="wrap">
<xsl:with-param name="words" as="xs:string+" select="'banana', 'apple', 'strawberry'"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" mode="wrap">
<xsl:param name="words" as="xs:string+"/>
<xsl:param name="wrapper-name" as="xs:string" select="'fruit'"/>
<xsl:analyze-string select="." regex="{string-join($words, '|')}">
<xsl:matching-substring>
<xsl:element name="{$wrapper-name}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
How can I combine the two AND preserve all other elements in the ?
Any advice would be greatly appreciated!
Best, RaBa
It's not exactly clear what can be hard-coded. Perhaps this could work for you:
XSLT 2.0
Note that this looks for patterns, not words. If the input contains
green applesauceit will be returned asgreen <fruit ref="001">apple</fruit>sauce.