I've been working on CSV to XML conversions using this style-sheet: XSLT 2.0 to convert CSV to XML format

I needed to account for both comma and pipe delimiters so I changed the style sheet to this:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="csv-uri" as="xs:string" select="'file:///c:/test.csv'"/>

    <xsl:template match="/" name="csv2xml">
                    <xsl:when test="unparsed-text-available($csv-uri)">
                        <xsl:variable name="csv" select="unparsed-text($csv-uri)" />

                        <xsl:variable name="pipe" select="'\|'"/>
                        <xsl:analyze-string  select="replace($csv,$pipe,',')" regex='\r\n?|\n' >

                                <xsl:if test="not(position()=0)" >
                                        <xsl:for-each select="tokenize(.,',')" >
                                            <xsl:element name="Column_{position()}">
                                                <xsl:value-of select="."/>

                        <xsl:variable name="error">
                            <xsl:text>Error reading file: "</xsl:text>
                            <xsl:value-of select="$csv-uri"/>
                            <xsl:value-of select="$error"/>
                        <xsl:value-of select="$error"/>

However, I've been having some difficulty accounting for quoted values from the input.

Currently if a row in the csv is:


the "testing,1,1,1" gets split into 4 columns into the CSV instead of one column.


    <?xml version="1.0" encoding="UTF-8"?>

I've done some research and using regex='("[^"]*")+' can accomplish this. But I'm not entirely sure how to implement this without removing something I have that I need (maybe probably in the Analyze-string block?). I need help please! Its probably something simple, so please school me or point me in the right direction. Any advice would be helpful.


You can just add another xsl:analyze-string to process the xsl:non-matching-substring from the first xsl:analyze-string...

CSV Input


Modified XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="csv-uri" as="xs:string" select="'file:///c:/users/dhaley/desktop/so.csv'"/>

    <xsl:template match="/" name="csv2xml">
                    <xsl:when test="unparsed-text-available($csv-uri)">
                        <xsl:variable name="csv" select="unparsed-text($csv-uri)" />

                        <xsl:variable name="pipe" select="'\|'"/>
                        <xsl:analyze-string  select="replace($csv,$pipe,',')" regex='\r\n?|\n' >

                                <xsl:if test="not(position()=0)" >
                                        <xsl:analyze-string select="." regex="&quot;([^&quot;]*)&quot;,?|([^,]+),?">
                                                <xsl:element name="Column_{position()}">
                                                    <xsl:value-of select="normalize-space(concat(regex-group(1),regex-group(2)))"/>
                                                <xsl:element name="Column_{position()}"/>

                        <xsl:variable name="error">
                            <xsl:text>Error reading file: "</xsl:text>
                            <xsl:value-of select="$csv-uri"/>
                            <xsl:value-of select="$error"/>
                        <xsl:value-of select="$error"/>

XML Output


The regex from the second xsl:analyze-string is using captured substrings to ignore the quotes and the comma. Here's an easier to read version:


If you want to keep the quotes, move them inside the parens:
