Parse CSV file but no formatting except delimiting the data

193 Views Asked by At

I have data in the below format

PAL : PAL : NF : "INCOME"."Taxable"
PAL : PAL : NF : "EXPENSES"."TotalExpenses"
PAL : PAL : NF : "EXPENSES"."Exceptional"

In java, i just want to delimit the data without doing any formatting, in the outputs also quotes should come. I usually use Univocity, when using the below code,

    //Simple CSV File Read
    List<String[]> allRows;
    try {
        CsvParserSettings settings = new CsvParserSettings();
        settings.getFormat().setLineSeparator("\n");
        settings.getFormat().setDelimiter(':');

        CsvParser parser = new CsvParser(settings);
        allRows = parser.parseAll(new FileReader(new File(csvFile)));
        int i =0, cols=0;
        for(String[] str:allRows){
            i++;
            cols = str.length;

            for(String s:str)
                System.out.print(s+" == ");

            System.out.println("");
            if(i == 10) break;
        }       
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }    

Output was like below, period symbol is within quotes, i am expecting output to be like input, quotes should come.

PAL == PAL == NF == INCOME"."Taxable
PAL == PAL == NF == EXPENSES"."TotalExpenses
PAL == PAL == NF == EXPENSES"."Exceptional

Expected Output

PAL == PAL == NF == "INCOME"."Taxable"
PAL == PAL == NF == "EXPENSES"."TotalExpenses"
PAL == PAL == NF == "EXPENSES"."Exceptional"
3

There are 3 best solutions below

1
On

Author of the library here. The thing is that "INCOME"."Taxable" is being handled as a quoted value, and it is treating the quotes between INCOME and Taxable as unescaped quotes.

It will basically try to "rescue" the value and find either a closing quote or a delimiter (determined by settings.setUnescapedQuoteHandling(...)).

In your case the easiest thing to do is to set your quote character to something like ' or even \0 if your input doesn't have to handle quoted values anyway. With this you should get "INCOME"."Taxable" as you expect.

Hope this helps

2
On

This looks like a combination of a bug in your code and a relaxation of the CSV spec in Univocity.

The input was

"INCOME"."Taxable"

Unfortunately, this is NOT valid CSV, as you have a string containing embedded quotes. The correct CSV encoding would have been

"INCOME"".""Taxable"

The Univocity library seems to have been non-strict about this and guessed that the input was meant to be a single string (since there was no input delimiter) in there. So, after parsing the internal value of that field was

INCOME"."Taxable

This is the actual contents of the string, without the outside quotes that are required to make it a string literal in Java.

Then when you wrote it out you neglected to add back the surrounding quotes, resulting in the output you see.

Summary:

  1. Univocity handled invalid input in a way that matches your requirements, so you're OK there.
  2. To fix your problem you have to put back the surrounding quotes yourself

    int field = 0;
    for(String s:str) {
        if (++field == 4)
            System.out.print("\"" + s + "\"");
        else
            System.out.print(s + " == ");
    }
    

This also fixes the other bug of the extra trailing == delimiter.

2
On

Why can't you do something like this and i tested the result as well. Please tweak your code accordingly.

Your data:

PAL : PAL : NF : "INCOME"."Taxable"
PAL : PAL : NF : "EXPENSES"."TotalExpenses"
PAL : PAL : NF : "EXPENSES"."Exceptional"

Code:

public static void parseFile(){
        String csvFile = "file/User.csv";
            String line;
            try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {

                while ((line = br.readLine()) != null) {
                    String equal_string = line.replaceAll(":", "==");
                    String quoate_string = equal_string.replaceAll("\"\"", "\"");

                    if(quoate_string.startsWith("\"") && quoate_string.endsWith("\"")){ 
                        String final_string = quoate_string.substring(1, quoate_string.length()-1);
                        System.out.println(" final : "+final_string);
                     }
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

Output:

 final : PAL == PAL == NF == "INCOME"."Taxable"
 final : PAL == PAL == NF == "EXPENSES"."TotalExpenses"
 final : PAL == PAL == NF == "EXPENSES"."Exceptional"