Convert XML to CSV with Scriptella, how to get attribute values?

604 Views Asked by At

I found an example of converting XML to CSV,In the example used, this structure

<!-- Demo input for ETL -->
<CATALOG>
    <CD>
        <TITLE>Empire Burlesque</TITLE>
        <ARTIST>Bob Dylan</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>Columbia</COMPANY>
        <PRICE>10.90</PRICE>
        <YEAR>1985</YEAR>
    </CD>
</CATALOG>

In this file structure, Scriptella code:

<script connection-id="out">Title;Artist;Country;Company;Price;Year</script>
<query connection-id="in">
    <!--XPath which all CD elements in a catalog-->
    /CATALOG/CD
    <!--Outputs all matched elements-->
    <script connection-id="out" if="rownum>1">$TITLE;$ARTIST;$COUNTRY;$COMPANY;$PRICE;$YEAR</script>
</script>

How can I convert the XML file that has the following structure

<CATALOG>
    <CD title='Empire Burlesque' artist='Bob Dylan'  country='USA'/>
    .............
    <CD title='Empire Burlesque' artist='Bob Dylan'  country='USA'/>
</CATALOG>

How do I get to the values of attributes in XML?

3

There are 3 best solutions below

0
On
<CATALOG>
    <CD title='Empire Burlesque' artist='Bob Dylan'  country='USA'/>
    .............
    <CD title='Empire Burlesque' artist='Bob Dylan'  country='USA'/>
</CATALOG>

You can get to the values of attributes in XML by this example:

/CATALOG/CD[1]/@title
0
On

You access attributes by name, just the same as tags. Look at here

In your case, setting the node to CD by xpath /CATALOG/CD you can access Tag and attributes this way:

$CD      -> ''     (because CD is an emtpy tag)
$title   -> 'Empire Burlesque' 
$artist  -> 'Bob Dylan' 
$country -> 'USA'

Also you can access other elements not in current selected node using function node.getString() and xpath like:

${node.getString("../CATALOG")} 

With this functions you can access elements (tags) by path and attributes by bracket-at notation like:

${node.getString("../CATALOG/CD[@title='Empire Burlesque']")}

you can also use index to select elements in the set rather than attributes:

${node.getString("../CATALOG/CD[2]")} 

this index notation is valid when using variables like in:

xml file: <A><B>1</B><B>2</B><B>3</B></A>
in scriptella:
/A
${B[2]}
0
On

You need first to have properly described drivers for all your connections. You cannot parse XML with Scriptella unless you use the xpath driver. More information there: http://scriptella.org/reference/drivers.html

Now for the magic bits: - you could use java libraries as alternate possibilities but since these 2 drivers are supported out of the box, I suggest to go with them - you wish to import xml -> xpath driver is needed - you wish to export csv -> csv driver is needed - text driver can also be used for outputting csv data, but you'd have to handle quoting and separators manually

If your xml data is in file data.xml and you wish to export it as csv data in file data.csv, I would suggest using the following scriptella etl script:

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
  <connection id="in" driver="xpath" url="data.xml" />
  <connection id="out" driver="csv" url="data.csv">
    quote=
    separator=;
  </connection>
  <script connection-id="out">
    TITLE,ARTIST,COUNTRY,COMPANY,PRICE,YEAR
  </script>
  <query connection-id="in">
    /CATALOG/CD
    <script connection-id="out">
      $TITLE,$ARTIST,$COUNTRY,$COMPANY,$PRICE,$YEAR
    </script>
  </query>
</etl>

Please respect the case used inside XML source. You must use $TITLE and not $title nor $Title, since <TITLE> is present in your XML source.

The rownum test is not needed for such ETL task.