We've been using Percona OSC for a while now to make changes to our mysql schema without locking the tables and it has worked great, typically adding a new column or index to "large" innodb tables (~3.8 million rows) within a couple of hours.
However, the last update I tried was only 40% complete after running for 7 hours (overnight, during our quietest period), with an estimate of a further 11 hours to complete (which keeps increasing). All bar 4GB of the available memory on the RedHat server was being used - 32GB, which we've recently upgraded from 16GB.
So what's going on here? Why could the time taken suddenly have jumped so high? Have we just reached some kind of threshold that percona / mysql / the server can't cope with? Are there any configs we can tweak to improve performance?
The table has 32 columns and 12 indexes (including the primary key and 2 other unique indexes). I know that's a lot, but as I say until recently it performed just fine.
The table also has several foreign keys pointing to it which we set to update using the drop_swap method.
The full command I used was:
pt-online-schema-change --execute --ask-pass --set-vars innodb_lock_wait_timeout=50 --alter-foreign-keys-method=drop_swap
--alter "ADD is_current TINYINT(1) DEFAULT '1' NOT NULL" u=admin,p=XXXXXXX,D=xxxxx_live,t=applicant
The innodb_buffer_pool_size is currently set to 2147483648 - should this be increased? If so, by how much? The web server (apache/php/symfony) is also running on this box.
The last change I made on this particular table was to change the collation of 1 field to utf8_bin (the other fields are utf8_unicode_ci) - could that make a difference?
Assuming all things being equal, I'd say some sort of threshold was crossed or some other process is loading the database. The number of indexes you are using is way high. pt-osc creates the new empty altered table and then starts copying in "chunks". The time taken for each chunk is dynamically adapted to last 0.5 sec (by default). You could check "show processlist" to see who's pressuring the database and also what chunk size is pt-osc using to get more insight.