Converting characters in XML text node to subscript or superscript with XSLT

42 Views Asked by At

I have an XML file structured like this:

<Chem>
  
  <Formula>
   CO{2}
  </Formula>
  
  <Name>
   Carbon Dioxide
  </Name>
</Chem>

How would I use XSLT to format the number (or all numbers in curly brackets) to subscript?

1

There are 1 best solutions below

4
michael.hor257k On

In XSLT 2.0 or higher, you could convert all digit characters within curly braces to subscript using:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Formula">
    <xsl:copy>
        <xsl:analyze-string select="." regex="\{{(\d+)\}}">
            <xsl:matching-substring>
                <xsl:value-of select="translate(regex-group(1), '0123456789', '₀₁₂₃₄₅₆₇₈₉')"/>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Note the double escaping of the curly brace characters: first, they are doubled to distinguish them from AVT expressions in XSLT; next, they are preceded by \ to be interpreted as literal characters in regex.

The result using your example input:

<?xml version="1.0" encoding="UTF-8"?>
<Chem>
   <Formula>
   CO₂
  </Formula>
   <Name>
   Carbon Dioxide
  </Name>
</Chem>