SmarterCSV end of file reached, but ruby's CSV library can process the file

268 Views Asked by At

I have two CSV files which are basically the same, but for some reason, SmarterCSV can not read the one named bad_file Here is a Gist of both files. Ruby's native CSV library can read bad_file no problem.

Before processing each file. I strip everything above the header row using the below code:

  def self.clean(file)
    if (csv = File.read(file).gsub!(/\A.+?(?=^Date,)/m, ''))
      tempfile = Tempfile.new('file_name')
      tempfile.write(csv)
      tempfile
    else
      file
    end
  end

I then pass that file into smarter CSV like this:

    File.open(file, encoding: 'bom|utf-8') do |f|
      chunk = SmarterCSV.process(f, {
                                   verbose: true,
                                   remove_empty_hashes: true,
                                   col_sep: :auto,
                                   force_utf8: true,
                                   force_simple_split: true,
                                   strip_chars_from_headers: /[\-"\xEF\xBB\xBF]/,
                                   duplicate_header_suffix: ''
                                 })
    end

I can not figure out what is even differnt about the CSV files, let alone why SmarterCSV can't process the bad one. Also, if anyone has a better method for stripping the unneeded info from the top of the spreadsheet, that could solve this problem right there.

1

There are 1 best solutions below

0
Tilo On

There was an issue was with BOM markers in CSV files, see: issue 219.

This issue has been fixed since SmarterCSV version 1.8.0