Let's say I have an system running on Kubernetes using rolling updates or AWS ECS using blue/green deployments or any other solution which offers zero-downtime deployments. The key point here is that new and existing version can coexist and they use the same relational database.
We have a table in the database called users
with a field username
which we want to split in the new version into two separate fields: firstname
and lastname
. To do that, we change the service code accordingly and do migration of existing records. This scenario also caters for potential rollbacks. This scenario should also cater for potential rollbacks.
During the deployment we first run the migration then run the rolling update. As far as I understand, this can lead to situation, where the migration is done, but the service with the previous versions is still consuming traffic. That's why we do the migration in a way, that it supports working both with the current version and the new version. We add new fields, but not removing the old one yet. username
field will be removed in a later release.
After deployment is finished, old version gets disabled and only new version is running, saving firstname
and lastname
separatelly. But here we end up in a situation, where if during the transition time a record was stored by the previous version, it will be stored in username
field. These records are not subject of the db migraiton, because they were created after in was launched.
How to overcome that situation? Should the db migration (only the data migration part) be launched after the deployment or maybe there is a gap in the process described?
You mentioned blue/green or other zero-downtime deployments so I am assuming that is the important piece here. Not that it can be done, but that it IS done with zero-outage.
You have control of the new application, why not make it handle both migrated and not-yet-migrated data? Instead of migrating the data before swap, migrate after swap. Any new data gets inserted correctly while old data gets migrated and the app handle the missing data by splitting the username field at run-time. Then remove this version-handling code when you remove the column later. Possibly a bit more difficult but the goal is zero-outage and not the easiest path to done.
If your database has insert triggers: after adding the columns to the table, you can add an insert trigger that fires when username column is inserted. The trigger itself would execute the same split logic as the migration and update the firstname and lastname columns when new data is inserted from the old app. The migration, which is updating (not inserting) will not fire the trigger, and the new app code will not include the username column, so it will also not fire the trigger. This would allow you to run the migration while the old app runs and then swap to the new app without needing the additional app code from the first method. Just remove the now unused trigger when you remove the now unused username column.
I think the important thing is, make sure your updates don't take excessive exclusive locks using proper batching techniques. Blocking read operations can quickly spiral into a huge queue of blocked processes that can bring your database server down ie. use small indexed update batches that are held very briefly. This could make the migration take longer, but who cares if the site is running just fine and any resulting read blocks are only held for milliseconds and the users don't notice.