Table files transfered between servers flags table as crashed

223 Views Asked by At

Work has a web site that uses large data sets, load balanced between two MySQL 5.6.16-64.2 servers using MyISAM, running on Linux (2.6.32-358.el6.x86_64 GNU/Linux.) This data is being updated hourly from a text based file set that is received from a MS-SQL database. To avoid disruption on reads from the web site and at the same time make sure the updates doesn't take too long following process was put in place:

Have the data one a third Linux box (only used for data update processing,) update the different data tables as needed, move a copy of the physical table files to the production servers under a temporary name, and then do a table swap by MySQL TABLE RENAME.

But every time the table (under the temporary name) is seen by the destination MySQL servers as being crashed and require repair. The repair takes too long, so it cannot be forced to do a repair before doing the table swap.

The processing is programmed in Ruby 1.8.7 by having a thread for each server (just as a FYI, this also happens if not doing it in a thread to a single server.)

The steps to perform file copy is as follows:

Use Net::SFTP to transfer the files to a destination folder that is not the database folder (done due to permissions.) Code example of the file transfer for the main table files (if table also has partition files then they are transferred separately and rspFile is assigned differently to match the temporary name.) For speed it is parallel uploaded:

Net::SFTP.start(host_server, host_user, :password => host_pwd) do |sftp|
  uploads = fileList.map { |f|
    rcpFile = File.basename(f, File.extname(f)) + prcExt + File.extname(f)
    sftp.upload(f, "/home/#{host_user}/#{rcpFile}")
  }
  uploads.each { |u| u.wait }
end

Then assign the files the owner and group to the mysql user and to move the files to the MySQL database folder, by using Net::SSH to execute sudo shell commands:

Net::SSH.start(host_server, host_user, :port => host_port.to_i, :password => host_pwd) do |ssh|
  doSSHCommand(ssh, "sudo sh -c 'chown mysql /home/#{host_user}/#{prcLocalFiles}'", host_pwd)
  doSSHCommand(ssh, "sudo sh -c 'chgrp mysql /home/#{host_user}/#{prcLocalFiles}'", host_pwd)
  doSSHCommand(ssh, "sudo sh -c 'mv /home/#{host_user}/#{prcLocalFiles} #{host_path}'", host_pwd)
end

The doSSHCommand method:

def doSSHCommand(ssh, cmd, pwd)
  result = ""
  ssh.open_channel do |channel|
    channel.request_pty do |c, success|
        raise "could not request pty" unless success

      channel.exec "#{cmd}" do |c, success|
          raise "could not execute command '#{cmd}'" unless success

        channel.on_data do |c, data|
          if (data[/\[sudo\]|Password/i]) then
            channel.send_data "#{pwd}\n"
          else
            result += data unless data.nil?
          end
        end
      end
    end
  end
  ssh.loop

  result
end

If done manually by using scp to move the files over, do the owner/group changes, and move the files, then it never crashes the table. By checking the file sizes compared between scp and Net::SFTP there are no difference.

Other process methods has been tried, but experience they take too long compared to using the method described above. Anyone have an idea of why the tables are being crashed and if there a solution to avoid table crash without having to do a table repair?

3

There are 3 best solutions below

2
On BEST ANSWER

The issue was found and solved:

The process database had the table files copied from one of the production databases, and did not show crashed on the process server and no issues when query and updating the data.

While searching the web following SO answer was found: MySQL table is marked as crashed

So by guessing that when the tables was copied from production to the process server, that the header info stayed the same and might interfere when copied back to the production servers during the processor. So it was tried by repairing the table on the process server and then run a few tests on our staging environment where the issue was also experienced. And surely enough that corrected the issue.

So the final solution was to repair the tables once on the process server before having the process script run hourly.

1
On

I see you've already found an answer, but two things struck me about this question.

One, you should look at rsync which gives you many more options, not the least of which is a speedier transfer, that may better suit this problem. File transfer between servers is basically why rsync exists as a thing.

Second, and I'm not trying to re engineer your system but you may have outgrown MySQL. It may not be the best fit for this problem. This problem may be better served by Riak where you have multiple nodes, or Mongo where you can deal with large files and have multiple nodes. Just two thoughts I had while reading your question.

1
On

The tables are marked as crashed because you're probably getting race conditions as you copy the files. That is, there are writes pending to the tables during your execution of your Ruby script, so the resulting copy is incomplete.

The safer way to copy MyISAM tables is to run the SQL commands FLUSH TABLES followed by FLUSH TABLES WITH READ LOCK first, to ensure that all pending changes are written to the table on disk, and then block any further changes until you release the table lock. Then perform your copy script, and then finally unlock the tables.

This does mean that no one can update the tables while you're copying them. That's the only way you can ensure you get uncorrupt files.

But I have to comment that it seems like you're reinventing MySQL replication. Is there any reason you're not using that? It could probably work faster, better, and more efficiently, incrementally and continually updating only the parts of the tables that have changed.