Batching when using ActiveRecord::Base.connection.execute

3k Views Asked by At

I am busy writing an migration that will allow us to move our yamler from Syck to Psych and finally upgrade our project to ruby 2. This migration is going to be seriously resource intensive though so I am going to need to use chunking.

I wrote the following method to confirm that the result of the migration I plan to use produces the expected result and can be done without down time. To avoid Active record performing the serialization automatically I needed to use ActiveRecord::Base.connection.execute

My method that describes the transformation is as follows

 def show_summary(table, column_name)
  a = ActiveRecord::Base.connection.execute <<-SQL
   SELECT id, #{column_name} FROM #{table}
  SQL
  all_rows = a.to_a; ""
  problem_rows = all_rows.select do |row|
    original_string = Syck.dump(Syck.load(row[1]))
    orginal_object = Syck.load(original_string)

    new_string = Psych.dump(orginal_object)
    new_object = Syck.load(new_string)

    Syck.dump(new_object) != original_string rescue true
  end

problem_rows.map do |row|
  old_string = Syck.dump(Syck.load(row[1]))
  new_string = Psych.dump(Syck.load(old_string)) rescue "Parse failure"
  roundtrip_string = begin
    Syck.dump(Syck.load(new_string))
  rescue => e
    e.message
  end

  new_row = {}
  new_row[:id] = row[0]
  new_row[:original_encoding] = old_string
  new_row[:new_encoding] = roundtrip_string
  new_row
  end
end

How can you use batching when making use of ActiveRecord::Base.connection.execute ?

For completeness my update function is as follows

  # Migrate the given serialized YAML column from Syck to Psych
  # (if any).
  def migrate_to_psych(table, column)
    table_name = ActiveRecord::Base.connection.quote_table_name(table)

    column_name = ActiveRecord::Base.connection.quote_column_name(column)

    fetch_data(table_name, column_name).each do |row|
      transformed = ::Psych.dump(convert(Syck.load(row[column])))

      ActiveRecord::Base.connection.execute <<-SQL
         UPDATE #{table_name}
         SET #{column_name} = #{ActiveRecord::Base.connection.quote(transformed)}
         WHERE id = #{row['id']};
      SQL
    end
  end

  def fetch_data(table_name, column_name)
    ActiveRecord::Base.connection.select_all <<-SQL
       SELECT id, #{column_name}
       FROM #{table_name}
       WHERE #{column_name} LIKE '---%'
    SQL
  end

Which I got from http://fossies.org/linux/openproject/db/migrate/migration_utils/legacy_yamler.rb

1

There are 1 best solutions below

3
On BEST ANSWER

You can easily build something with SQL's LIMIT and OFFSET clauses:

def fetch_data(table_name, column_name)
  batch_size, offset = 1000, 0
  begin
    batch = ActiveRecord::Base.connection.select_all <<-SQL
      SELECT id, #{column_name}
      FROM #{table_name}
      WHERE #{column_name} LIKE '---%'
      LIMIT #{batch_size} 
      OFFSET #{offset}
    SQL
    batch.each do |row|
      yield row
    end
    offset += batch_size
  end until batch.empty?
end

which you can use almost exactly the same as before, just without the .each:

fetch_data(table_name, column_name) do |row| ... end

HTH!