I have two CSV files which are basically the same, but for some reason, SmarterCSV can not read the one named bad_file Here is a Gist of both files. Ruby's native CSV library can read bad_file no problem.
Before processing each file. I strip everything above the header row using the below code:
def self.clean(file)
if (csv = File.read(file).gsub!(/\A.+?(?=^Date,)/m, ''))
tempfile = Tempfile.new('file_name')
tempfile.write(csv)
tempfile
else
file
end
end
I then pass that file into smarter CSV like this:
File.open(file, encoding: 'bom|utf-8') do |f|
chunk = SmarterCSV.process(f, {
verbose: true,
remove_empty_hashes: true,
col_sep: :auto,
force_utf8: true,
force_simple_split: true,
strip_chars_from_headers: /[\-"\xEF\xBB\xBF]/,
duplicate_header_suffix: ''
})
end
I can not figure out what is even differnt about the CSV files, let alone why SmarterCSV can't process the bad one. Also, if anyone has a better method for stripping the unneeded info from the top of the spreadsheet, that could solve this problem right there.
There was an issue was with BOM markers in CSV files, see: issue 219.
This issue has been fixed since SmarterCSV version 1.8.0