My aim is to parse a CSV-dataset with non normalized data in some rows which is enclosed in "". I cannot split it by ";" because this char is also used inside of the data.
I ask myself, if there is an easy way to solve this!?
Some rows contain non normalized data in "goes_to" column, others in "comes_from" column like this (see row 'John' and 'David'). This data uses the ";" delimiter which creates problems.
Name;goes_to;comes_from
Peter;;London
Ruth;Boston;
Brandon;;
John;;"Bern;Madrid;Tel Aviv"
David;"New York;Paris;Berlin";
Eventually the aim is to normalize the data and put it into two separate multimap structures, so I am able to access that data individually.
comes_from_multimap.get('John'); >>> ['Bern', 'Madrid']
goes_to_multimap.get('David') >>> ['New York','Paris','Bern']
I use a line leader, read the CSV line by line, and I manage to extract the string between the parenthesis like the following code, to decide if this line needs normalisation. If the row contains non normalized data, I would use a loop. Though with my approach I am losing the information, if it came from "goes_to" or "comes_from" column because my code just gets me the text between two parenthsis without the context where it came from.
nonNormalizedSubString = line.substring(line.indexOf("\"") + 1, line.lastIndexOf("\""));
try csv-parser
set
;
as a delimiter, you'll get an array of results, so you can use element's position to determine to which column it belongs and create the dataset you need: