The following is the code i have used. I am not able to delete the rows from Main.csv, when the value of "name" col in Main.csv equals to the value of "name" col in Sub.csv. Please help me on the same. I know i am missing something. Thanks in advance.

require 'rubygems'
require 'smarter_csv'
main_csv = SmarterCSV.process('Main.csv', {:chunk_size => 100}) do |chunk|
short_csv = SmarterCSV.process('Sub.csv', {:chunk_size => 100}) do |smaller_chunk|
    chunk.each do |each_ch|
        smaller_chunk.each do |small_each_ch|
                each_ch.delete_if{|k,v| v == small_each_ch[:name]}

        end
    end
end

end

1

There are 1 best solutions below

0
Tilo On

It's a bit of a non-standard scenario for smarter_csv..

Sub.csv has 2000 rows. whereas Main.csv has around 1million rows.

If all you need to decide is if the name appears in both files, then you can do this:

1) read the Sub.csv file first, and just store the values of name in an array sub_names

2) open an output file for the result.csv file

3) read the Main.csv file, with processing in chunks, and write the data for each row to the result.csv file, if the name does not appear in the array sub_names

4) close the output file - est voila!