CSV parsing with univocity-parsers and backslash-escaped quotes

302 Views Asked by At

I'm having some trouble parsing CSV with backslash escaped qoutes \". Most of lines in source CSV don't include escaped quotes but where there are I can't seem to find appropriate settings for correct parsing.

CSV example (each line with 4 columns):

1,,No quote escape,test
2,,"One quote escape\"",test
3,,"Two \"quote escapes\",test
4,,"Two \"quote escapes\" 2",test

CSV parser settings:

CsvFormat:
        Comment character=#
        Field delimiter=,
        Line separator (normalized)=\n
        Line separator sequence=\r\n
        Quote character="
        Quote escape character=\
        Quote escape escape character=null

Code snippet:

CsvParserSettings settings = new CsvParserSettings();

settings.setDelimiterDetectionEnabled(true);
settings.setLineSeparatorDetectionEnabled(true);
settings.getFormat().setQuote('"');
settings.getFormat().setQuoteEscape('\\');

CsvParser parser = new CsvParser(settings);

parser.beginParsing(file, StandardCharsets.UTF_8);
...

Lines are parsed correctly until two escaped quotes are present in one line. Expected parsed lines are:

- 1,null,No quote escape,test
- 2,null,One quote escape",test
- 3,null,Two "quote escapes",test
- 4,null,Two "quote escapes" 2,test
1

There are 1 best solutions below

0
On BEST ANSWER

Upon further inspection I found an existing issue for v2.9.1.