Converting a CSV file with double quotes to a pipe-delimited format using sed

1k Views Asked by At

I'm trying to convert CSV files into pipe-delimited text files.

Let's say I have a file called sample.csv:

aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z

I'd like to convert this into an output that looks like this:

aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z

Now after tons of searching, I have come the closest using this sed command:

sed -r 's/""/\v/g;s/("([^"]+)")?,/\2\|/g;s/"([^"]+)"$/\1/;s/\v/"/g'

However, the output that I received was:

aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|pppqqq|rrr" sss|ttt,"uuu|Z

Where the expected for the 9th column should have been ppp"qqq" but the result removed the double quotes and what I got was pppqqq.

I have been playing around with this for a while, but to no avail. Any help regarding this would be highly appreciated.

3

There are 3 best solutions below

0
On BEST ANSWER

The problem with sample.csv is that it mixes non-quoted fields (containing quotes) with fully quoted fields (that should be treated as such).

You can't have both at the same time. Either all fields are (treated as) unquoted and quotes are preserved, or all fields containing a quote (or separator) are fully quoted and the quotes inside are escaped with another quote.

So, sample.csv should become:

"aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z

to give you the desired result (using a csv parser):

aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
3
On

As suggested in comments sed or any other Unix tool is not recommended for this kind of complex CSV string. It is much better to use a dedicated CSV parser like this in PHP:

$s = 'aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z';
echo implode('|', str_getcsv($s));
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|nnnooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
0
On

Have the same problem. I found right result with https://www.papaparse.com/demo Here is a FOSS on github. So maybe you can check how it works. With the source of [ "aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z ] The result appears in the browser console: [1]: https://i.stack.imgur.com/OB5OM.png