Hi I have a CSV File where the encapsulator character is not escaped properly.
Example
[email protected],"uhrege gerjhhg er<span style="background-color: rgb(0,153,0);">eriueiru kernger</span><font color="#009900"><span style="background-color: rgb(255,255,255);"> weiufhuweifbw fhew fibwefbw</span></font><div><font color="#009900"><span style="background-color: rgb(255,255,255);">wekifbwe fewf</span></font></div><div><font color="#009900"><span style="background-color: rgb(255,255,255);">weiuifgewbfjew f</span></font></div>",18-Oct-2016,
Delimiter -> ,
Encapsulator -> "
It breaks when I try to read using commons-csv reader ,
throws a ' invalid char between encapsulated token and delimiter
' Exception .
However Microsoft excel seems to open the file perfectly. Any ideas on how to procced ? .
How does one parse CSV files where the encapsulator is not escaped properly ?.Excel seems to open such files fine.
If you can't fix this at the source (i.e. generate a well-formed csv), and you want to parse this yourself, you could go the easy way:
Scan field1 up to
,"
- field2 up to",
- rest is field3 (trailing comma?).Of course if a
",
occurs in the html field, there's a problem. You could solve that by first scanning up to,"
, and then backwards (starting at the end of the line) to",
.If there are more fields than you show here, you could look for a
,
combined with a"
(both combinations, could also be","
) and hope those do not appear in the field data.