Why do I have a trailing column when reading a CSV file?

489 Views Asked by At

I have a CSV file whith the following structure:

"customer_id";"customer_name";"quantity";
"id1234";"Henry";"15";

Parsing with Ruby's standard CSV lib:

csv_data = CSV.read(pathtofile,{
    :headers => :first_row,
    :col_sep => ";",
    :quote_char => '"'
    :row_sep => "\r\n" #setting it to "\r" or "\n" results in MalformedCSVError
})

puts csv_data.headers.count #4

I don't understand why the parsing seems to result in four columns although the file only contains three. Is this not the right approach to parse the file?

2

There are 2 best solutions below

1
On BEST ANSWER

The ; at the end of each row is implying another field, even though there is no value.

I would either remove the trailing ;'s or just ignore the fourth field when it is parsed.

0
On

The trailing ; is the culprit.

You can preprocess the file, stripping the trailing ;, but that incurs unnecessary overhead.

You can post-process the returned array of data from CSV using something like this:

csv_data = CSV.read(...).map(&:pop)

That will iterate over the sub-arrays, removing the last element in each. The problem is that read isn't scalable, so you might want to rethink using it and instead, use CSV.foreach to read the file line by line and then pop the last value as they're returned to you.