Handling data containing comma in a CSV file

643 Views Asked by At

I have a CSV File that I am trying to read from Amazon S3 in Mosaic Decisions. This file contains an Address column in which the data itself contains a comma.

Example data in the file is shown below:

Address
sl,name,address
1,Ratan Kumar,FlatNo 122,Mumbai,Maharashtra

In this case, the address field is getting separated into 3 columns as- address, Missing_header_0, Missing_header_1 and the data is read as

sl,name,address,Missing_header_0, Missing_header_1
1,Ratan Kumar,FlatNo 122,Mumbai,Maharashtra

This corrupts the actual data and overrides the next column data How can we avoid this scenario?

2

There are 2 best solutions below

0
On BEST ANSWER

To avoid this scenario,

  1. Open the Reader node configuration
  2. Pass a single quote (') or double quote (") in the Quote text box available in the configuration tab

This feature of Mosaic Decisions allows wrapping the data in each field with quotes.

This would give the desired outcome.

Refer to the screenshot

2
On

Fields containing a separator should be enclosed in double quotes:

sl,name,address
1,Ratan Kumar,"FlatNo 122,Mumbai,Maharashtra"

If you have no control over the creation of this file you could either contact the creator and ask to fix a malformed csv file, or write some custom code/script to parse the first 2 fields and treat the remainder of the line as the third field (if the address field is indeed the last field).