use smarter csv gem and processing csv in chunks - i need to delete rows from a large csv ( 2GB) by comparing the key/values with another csv (1 GB)

1k Views Asked by Hayz At 09 January 2017 at 03:04

The following is the code i have used. I am not able to delete the rows from Main.csv, when the value of "name" col in Main.csv equals to the value of "name" col in Sub.csv. Please help me on the same. I know i am missing something. Thanks in advance.

require 'rubygems'
require 'smarter_csv'
main_csv = SmarterCSV.process('Main.csv', {:chunk_size => 100}) do |chunk|
short_csv = SmarterCSV.process('Sub.csv', {:chunk_size => 100}) do |smaller_chunk|
    chunk.each do |each_ch|
        smaller_chunk.each do |small_each_ch|
                each_ch.delete_if{|k,v| v == small_each_ch[:name]}

        end
    end
end

end

Original Q&A

There are 1 best solutions below

Tilo On 28 January 2018 at 18:25

It's a bit of a non-standard scenario for smarter_csv..

Sub.csv has 2000 rows. whereas Main.csv has around 1million rows.

If all you need to decide is if the name appears in both files, then you can do this:

1) read the Sub.csv file first, and just store the values of name in an array sub_names

2) open an output file for the result.csv file

3) read the Main.csv file, with processing in chunks, and write the data for each row to the result.csv file, if the name does not appear in the array sub_names

4) close the output file - est voila!

use smarter csv gem and processing csv in chunks - i need to delete rows from a large csv ( 2GB) by comparing the key/values with another csv (1 GB)

There are 1 best solutions below

Related Questions in RUBY

Related Questions in CSV

Related Questions in SMARTERCSV

Trending Questions

Popular # Hahtags

Popular Questions