Most efficient way to migrate a MySQL database while altering the primary key data type

284 Views Asked by At

Since MySQL 5.6 support is coming to an end next month, we are preparing for the migration.

The DB is hosted on AWS Aurora. If we are to simply upgrade the engine to 5.7, we could do it online (RDS Aurora supports blue/green upgrade), or we could create a new DB with 5.7 and use DMS for real time replication with CDC.

Now, the complication is that we also want to change the data type of many of our tables from INT to BIGINT, because we will soon run out of space.

To do this, we are exploring two ways:

  1. Take a snapshot of the original DB (take not of binlog position), restore it to a new 5.7 DB cluster, run ALTER TABLE to change the primary key data type, and the start real time replication from original DB to the new DB.

  2. Create a new 5.7 empty DB. Create tables with the new BIGINT data type and then start replicating from the original DB to fill the tables.

There are few unclear points in both these approaches, that I would like advice on:

  • In approach (#1), the existing indices will be re-indexed to the new BIGINT type, as part of the ALTER TABLE command. However, given the size of the tables, this could take a long time ( a very long time).

  • However, with approach (#2), we are not sure what would happen to the existing indices. Do we have to drop all the existing indices and re-index again with the new data type?

Is there any other more efficient way for us to go about this? We plan to switch-over once the new DB is upto speed with the original DB, so the downtime is the same for both these approaches.

I am not sure which one will be a guaranteed way to preserve the data integrity when changing the data type?

Thank you.

1

There are 1 best solutions below

2
Bill Karwin On

Replication does not fill empty tables. It only replicates changes, not the initial data. You would still have to do your solution 1 first.

I'd recommend to use the free tool pt-online-schema-change to alter the tables. It will actually take longer than using ALTER TABLE, but there's no downtime. It allows you to continue reading and writing to the table while it's making the change, so the length of time is not so nerve-wracking. Also using this solution you don't need to use a replica. You can make the change in place on the primary. At my last job, we ran hundreds of changes every week using pt-online-schema-change.

The only downside to that solution is that it requires exclusive access to the table momentarily at the start and at the end of the process. Unless you tend to have long-running transactions inhibiting this, it's not a problem.

Like any new tool, you should test it out on a non-production instance until you're confident you know how to use the tool. Don't try any tool for the first time on your production data!

Either of the solutions will change all of the secondary indexes as part of the change to the primary key index. Either way you do it, this will happen automatically.