Why is Percona pt-online-schema-change performing so badly?

2.1k Views Asked by At

We've been using Percona OSC for a while now to make changes to our mysql schema without locking the tables and it has worked great, typically adding a new column or index to "large" innodb tables (~3.8 million rows) within a couple of hours.

However, the last update I tried was only 40% complete after running for 7 hours (overnight, during our quietest period), with an estimate of a further 11 hours to complete (which keeps increasing). All bar 4GB of the available memory on the RedHat server was being used - 32GB, which we've recently upgraded from 16GB.

So what's going on here? Why could the time taken suddenly have jumped so high? Have we just reached some kind of threshold that percona / mysql / the server can't cope with? Are there any configs we can tweak to improve performance?

The table has 32 columns and 12 indexes (including the primary key and 2 other unique indexes). I know that's a lot, but as I say until recently it performed just fine.

The table also has several foreign keys pointing to it which we set to update using the drop_swap method.

The full command I used was:

pt-online-schema-change --execute --ask-pass --set-vars innodb_lock_wait_timeout=50 --alter-foreign-keys-method=drop_swap
--alter "ADD is_current TINYINT(1) DEFAULT '1' NOT NULL" u=admin,p=XXXXXXX,D=xxxxx_live,t=applicant

The innodb_buffer_pool_size is currently set to 2147483648 - should this be increased? If so, by how much? The web server (apache/php/symfony) is also running on this box.

The last change I made on this particular table was to change the collation of 1 field to utf8_bin (the other fields are utf8_unicode_ci) - could that make a difference?

2

There are 2 best solutions below

0
On

Assuming all things being equal, I'd say some sort of threshold was crossed or some other process is loading the database. The number of indexes you are using is way high. pt-osc creates the new empty altered table and then starts copying in "chunks". The time taken for each chunk is dynamically adapted to last 0.5 sec (by default). You could check "show processlist" to see who's pressuring the database and also what chunk size is pt-osc using to get more insight.

0
On

How large is this table in terms of MB/GB?

InnoDB caches it's pages in the innodb bufferpool (innodb_buffer_pool_size) and this is important for performance. On dedicated hosts with > 4GB RAM we recommend the guideline of around 70-80% of the memory should be used for InnoDB buffer pool.

Use the SQL on this post to gather the logical sizes of your tables and indexes

https://www.percona.com/blog/2008/03/17/researching-your-mysql-table-sizes/

With that info you'll be able to tell immediately if the MySQL instance (Innodb engine) is being starved of memory.

If your working dataset fits inside memory, great, but if not then you're likely incurring cache misses and then MySQL will need to perform IO to access disk resources to swap the pages into the buffer pool. (IO is always a PITA in DB land)

The nature of the pt-osc work is to create a new modified copy of the table and back-fill the new version with the rows from the original. New rows are also inserted/updated or deleted using the triggers the tool sets up. To perform this backfill it's going to have to touch all the rows in that table at some point and much of the table might be cold (not residing in the bufferpool in RAM). So basically you've got a modest amount of RAM on the machine but really InnoDB is only seeing 2GB of that.

You have applications running on the server too so it's going to take some observation on your side to tune but I would expect that you could considerably raise the level of memory allocated to the buffer pool. I would also expect that much of your RAM is not in use but has been allocated to the filesystem cache.

If your table is only a few hundred mb (which I doubt with 4m records and a wide schema) then perhaps there are deeper issues to review but I'm confident that with a change to the bufferpool size you'll see some better performance.

Also, it's work checking that your innodb_log_file_size is tuned to your workload. This is important so that MySQL can defer IO. What is it's current size?