<xls:sort is not working when special character [i.e. dot(.), colon(:), semicolon(;)] is present

68 Views Asked by At

xsl:sort is not working properly when special character like ", : ;" are presents. What might be the problem here?

I am trying to render html from xml data with java dom4j library. Though the real project has multiple sorting criteria, I am illustrating a summery here to the specific problem.

The expected sorting order is ASCIIbatical order. The online String Sort Toll is also giving my expected output.

I have some xml data ex:

<employee>
    <name>
        <![CDATA[test 1.8 - test]]>
    </name>
    <name>
        <![CDATA[test 4 - test]]>
    </name>
    <name>
        <![CDATA[Test 2 - test]]>
    </name>
</employee>

I am trying to sort them with xsl transformer with . The snippet for sorting it...

<xsl:for-each select="employee/name">
    <xsl:sort select="name" case-order="upper-first"/>
...

Expected Output Order:

Test 2 - test
test 1.8 - test
test 4 - test

The Output Order I am getting:

test 1.8 - test
Test 2 - test
test 4 - test

Like this getting the order

; - ;
& - &
T1 - T1

In stead of

& - &
; - ;
T1 - T1
3

There are 3 best solutions below

2
michael.hor257k On

First, your code snippet sorts employees, not names.

Now, the result you get is the expected output after sorting the names. If you want the entry of Test 2 - test to come first because it starts with a capital T, then you need to do:

    <xsl:for-each select="name">
        <xsl:sort collation="http://www.w3.org/2005/xpath-functions/collation/codepoint"/>
        ....
    </xsl:for-each> 

and you will need a processor that supports XSLT 2.0 or higher for that.

I don't see what the presence of the dot or any other character has to do with any of this.

0
michael.hor257k On

The expected sorting order is ASCIIbatical order.

If we accept the definition of "sorting in ASCIIbetical order" as sorting by the order in which characters appear in the ASCII table (i.e. their Unicode codepoint values), while considering only printable characters (i.e. characters in the range from &#32; to &#126;), then we could do the sorting as shown in the following example:

XML

<input>
   <char ascii="32"> </char>
   <char ascii="45">-</char>
   <char ascii="95">_</char>
   <char ascii="44">,</char>
   <char ascii="59">;</char>
   <char ascii="58">:</char>
   <char ascii="33">!</char>
   <char ascii="63">?</char>
   <char ascii="47">/</char>
   <char ascii="46">.</char>
   <char ascii="96">`</char>
   <char ascii="94">^</char>
   <char ascii="126">~</char>
   <char ascii="39">'</char>
   <char ascii="34">"</char>
   <char ascii="40">(</char>
   <char ascii="41">)</char>
   <char ascii="91">[</char>
   <char ascii="93">]</char>
   <char ascii="123">{</char>
   <char ascii="125">}</char>
   <char ascii="64">@</char>
   <char ascii="36">$</char>
   <char ascii="42">*</char>
   <char ascii="92">\</char>
   <char ascii="38">&amp;</char>
   <char ascii="35">#</char>
   <char ascii="37">%</char>
   <char ascii="43">+</char>
   <char ascii="60">&lt;</char>
   <char ascii="61">=</char>
   <char ascii="62">&gt;</char>
   <char ascii="124">|</char>
   <char ascii="48">0</char>
   <char ascii="49">1</char>
   <char ascii="50">2</char>
   <char ascii="51">3</char>
   <char ascii="52">4</char>
   <char ascii="53">5</char>
   <char ascii="54">6</char>
   <char ascii="55">7</char>
   <char ascii="56">8</char>
   <char ascii="57">9</char>
   <char ascii="97">a</char>
   <char ascii="65">A</char>
   <char ascii="98">b</char>
   <char ascii="66">B</char>
   <char ascii="99">c</char>
   <char ascii="67">C</char>
   <char ascii="100">d</char>
   <char ascii="68">D</char>
   <char ascii="101">e</char>
   <char ascii="69">E</char>
   <char ascii="102">f</char>
   <char ascii="70">F</char>
   <char ascii="103">g</char>
   <char ascii="71">G</char>
   <char ascii="104">h</char>
   <char ascii="72">H</char>
   <char ascii="105">i</char>
   <char ascii="73">I</char>
   <char ascii="106">j</char>
   <char ascii="74">J</char>
   <char ascii="107">k</char>
   <char ascii="75">K</char>
   <char ascii="108">l</char>
   <char ascii="76">L</char>
   <char ascii="109">m</char>
   <char ascii="77">M</char>
   <char ascii="110">n</char>
   <char ascii="78">N</char>
   <char ascii="111">o</char>
   <char ascii="79">O</char>
   <char ascii="112">p</char>
   <char ascii="80">P</char>
   <char ascii="113">q</char>
   <char ascii="81">Q</char>
   <char ascii="114">r</char>
   <char ascii="82">R</char>
   <char ascii="115">s</char>
   <char ascii="83">S</char>
   <char ascii="116">t</char>
   <char ascii="84">T</char>
   <char ascii="117">u</char>
   <char ascii="85">U</char>
   <char ascii="118">v</char>
   <char ascii="86">V</char>
   <char ascii="119">w</char>
   <char ascii="87">W</char>
   <char ascii="120">x</char>
   <char ascii="88">X</char>
   <char ascii="121">y</char>
   <char ascii="89">Y</char>
   <char ascii="122">z</char>
   <char ascii="90">Z</char>
</input>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="ascii">!"#$%&amp;'()*+,-./0123456789:;&lt;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~</xsl:param>
<xsl:param name="en">-_,;:!?/.`^~'"()[]{}@$*\&amp;#%+&lt;=>|0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ</xsl:param>

<xsl:template match="/root">
    <xsl:copy>
        <xsl:for-each select="char">
            <xsl:sort select="translate(., $ascii, $en)" lang="en"/>
            <xsl:copy-of select="."/>
        </xsl:for-each> 
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Result*

<?xml version="1.0" encoding="UTF-8"?><root>
<char ascii="32"> </char>
<char ascii="33">!</char>
<char ascii="34">"</char>
<char ascii="35">#</char>
<char ascii="36">$</char>
<char ascii="37">%</char>
<char ascii="38">&amp;</char>
<char ascii="39">'</char>
<char ascii="40">(</char>
<char ascii="41">)</char>
<char ascii="42">*</char>
<char ascii="43">+</char>
<char ascii="44">,</char>
<char ascii="45">-</char>
<char ascii="46">.</char>
<char ascii="47">/</char>
<char ascii="48">0</char>
<char ascii="49">1</char>
<char ascii="50">2</char>
<char ascii="51">3</char>
<char ascii="52">4</char>
<char ascii="53">5</char>
<char ascii="54">6</char>
<char ascii="55">7</char>
<char ascii="56">8</char>
<char ascii="57">9</char>
<char ascii="58">:</char>
<char ascii="59">;</char>
<char ascii="60">&lt;</char>
<char ascii="61">=</char>
<char ascii="62">&gt;</char>
<char ascii="63">?</char>
<char ascii="64">@</char>
<char ascii="65">A</char>
<char ascii="66">B</char>
<char ascii="67">C</char>
<char ascii="68">D</char>
<char ascii="69">E</char>
<char ascii="70">F</char>
<char ascii="71">G</char>
<char ascii="72">H</char>
<char ascii="73">I</char>
<char ascii="74">J</char>
<char ascii="75">K</char>
<char ascii="76">L</char>
<char ascii="77">M</char>
<char ascii="78">N</char>
<char ascii="79">O</char>
<char ascii="80">P</char>
<char ascii="81">Q</char>
<char ascii="82">R</char>
<char ascii="83">S</char>
<char ascii="84">T</char>
<char ascii="85">U</char>
<char ascii="86">V</char>
<char ascii="87">W</char>
<char ascii="88">X</char>
<char ascii="89">Y</char>
<char ascii="90">Z</char>
<char ascii="91">[</char>
<char ascii="92">\</char>
<char ascii="93">]</char>
<char ascii="94">^</char>
<char ascii="95">_</char>
<char ascii="96">`</char>
<char ascii="97">a</char>
<char ascii="98">b</char>
<char ascii="99">c</char>
<char ascii="100">d</char>
<char ascii="101">e</char>
<char ascii="102">f</char>
<char ascii="103">g</char>
<char ascii="104">h</char>
<char ascii="105">i</char>
<char ascii="106">j</char>
<char ascii="107">k</char>
<char ascii="108">l</char>
<char ascii="109">m</char>
<char ascii="110">n</char>
<char ascii="111">o</char>
<char ascii="112">p</char>
<char ascii="113">q</char>
<char ascii="114">r</char>
<char ascii="115">s</char>
<char ascii="116">t</char>
<char ascii="117">u</char>
<char ascii="118">v</char>
<char ascii="119">w</char>
<char ascii="120">x</char>
<char ascii="121">y</char>
<char ascii="122">z</char>
<char ascii="123">{</char>
<char ascii="124">|</char>
<char ascii="125">}</char>
<char ascii="126">~</char>
</root>

(*)IMPORTANT:

The above result was produced using the Xalan 2.7.2 processor with lang="en". You may easily get a different result if using a different processor and/or a different lang value. In such case you need to adjust the $en parameter to reflect the actual sorting order of your environment.

And of course, as shown in my other answer, all this is unnecessary if you are using a processor that supports XSLT 2.0+.

0
forty-two On

This might not fit your requirement at all, but noting that DOM4J uses "live" lists for element content, sorting the nodes directly in code becomes quite elegant:

doc.getRootElement().content().sort(Comparator.comparing(Node::getText));

Of course it messes up any existing formatting, but that can probably be fixed with a pretty printing option.