Parsing CSV file with non normalized data and overloaded delimiter in NodeJS

452 Views Asked by At

My aim is to parse a CSV-dataset with non normalized data in some rows which is enclosed in "". I cannot split it by ";" because this char is also used inside of the data.

I ask myself, if there is an easy way to solve this!?

Some rows contain non normalized data in "goes_to" column, others in "comes_from" column like this (see row 'John' and 'David'). This data uses the ";" delimiter which creates problems.

Name;goes_to;comes_from
Peter;;London
Ruth;Boston;
Brandon;;
John;;"Bern;Madrid;Tel Aviv"
David;"New York;Paris;Berlin";

Eventually the aim is to normalize the data and put it into two separate multimap structures, so I am able to access that data individually.

comes_from_multimap.get('John'); >>> ['Bern', 'Madrid']
goes_to_multimap.get('David') >>> ['New York','Paris','Bern']

I use a line leader, read the CSV line by line, and I manage to extract the string between the parenthesis like the following code, to decide if this line needs normalisation. If the row contains non normalized data, I would use a loop. Though with my approach I am losing the information, if it came from "goes_to" or "comes_from" column because my code just gets me the text between two parenthsis without the context where it came from.

nonNormalizedSubString = line.substring(line.indexOf("\"") + 1, line.lastIndexOf("\""));
1

There are 1 best solutions below

0
On

try csv-parser

set ; as a delimiter, you'll get an array of results, so you can use element's position to determine to which column it belongs and create the dataset you need:

const csvParse = require('csv-parse')

const data = `Name;goes_to;comes_from
Peter;;London
Ruth;Boston;
Brandon;;
John;;"Bern;Madrid;Tel Aviv"
David;"New York;Paris;Berlin";`

const records = csvParse.parse(data, {
    delimiter: ';',
    trim: true
}, (err, records) => {
    console.log(records);
});