I'm trying to convert the following XML to CSV using XPath 3.0 (xidel --xpath):
<?xml version="1.0" encoding="utf-8" ?>
<csv>
<record>
<field1>A</field1>
<field2>B</field2>
<field3>C</field3>
</record>
<record>
<field2> </field2>
<field3></field3>
</record>
<record>
<field1>,,</field1>
<field2>""</field2>
<field3>..</field3>
<field3>.
.</field3>
</record>
</csv>
My expected output would be:
field1,field2,field3
A,B,C
, ,""
",,","""""",".
."
I've got a few problems (the first one isn't specific to xidel):
I get the fields names withdistinct-values(/csv/record/*/name()); how can I use that sequence for extracting the data in the records?I would like to differentiate between a missing and an empty field but the
text()selector ofxideldoesn't seem to care about that; is it a XPath feature or axidelbug?I can't makereturnwork; doesxideluse a different syntax?
Update
I solved #1 myself and #3 was resolved by @ConalTuohy in his comment.
Here's what I got now:
#!/bin/bash
IFS='' read -r -d '' xpath <<'EOF'
let $csv-escape-string := function($str as xs:string) as xs:string {
if ( matches( $str, ',|"|\n' ) )
then
concat('"',replace($str,'"','""'),'"')
else
$str
},
$fields-names := distinct-values(/csv/record/*/name()),
$csv := (
string-join( $fields-names, ',' ),
/csv/record/string-join(
(
for $fn in $fields-names
return $csv-escape-string(string( *[name()=$fn][last()]/text() ))
), ','
)
)
return $csv
EOF
xidel --xpath "$xpath" file.xml
But the output isn't what I would like it to be:
field1,field2,field3
A,B,C
,,
",,","""""",".
."
Could someone try it with an other XPath-3 processor for making sure that it is xidel that is normalizing text()?
I've made "better" alternatives afterwards.
The main improvement is that now the selector
/csv/recordonly needs to be specified once.nullandemptyvalues, and quoting fields only when required: