handling data while replacing delimiter from double quotes

47 Views Asked by At

I'm trying to remove the pipe delimiter from double quotes,its replacing with following code but its not handling empty filed and numeric fields. Any help would be appreciated.

Code:

 sed 's/|*\([^"]\)/\1/g;s/\([^"]\),*/\1/g' test.csv

INPUT:

"Southern|Palms"|"AA|None"|"4"|"Kenya"|"MBA"|"Ken|Coast"|"Y"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"||40.00|"HB|AI"||||||"Y"||||

OUTPUT:

"SouthernPalms"|"AANone"|"4"|"Kenya"|"MBA"|"KenCoast"|"Y"1/11/2019 00:00:0030/4/2020 00:00:00|"TH"40.00|"HBAI"|"Y"|

Expected Output:

"Southern Palms"|"AA None"|"4"|"Kenya"|"MBA"|"Ken Coast"|"Y"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"||40.00|"HB AI"||||||"Y"||||
1

There are 1 best solutions below

1
Corentin Limier On
sed -E 's/(^|[^"|])\|($|[^"|])/\1 \2/g' <file>

Returns :

"Southern Palms"|"AA None"|"4"|"Kenya"|"MBA"|"Ken Coast"|"Y"|1/11/2019 00:00:00 30/4/2020 00:00:00|"TH"||40.00|"HB AI"||||||"Y"||||

It won't work for lines like this and I'm sure there are many other examples that could break the regexp :

"Southern|Palms"|"AA|||None"|"4"|"Kenya"|"MBA"|"Ken|Coast"|"Y"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"||40.00|"HB|AI"||||||"Y"||||

But awk/sed are not the proper tools to read csv with quotechar and escapechar. For complex files like this, I would consider using python or any CSV reader that take these options into account.