How to handle db data stored according to previous application version during rolling update

80 Views Asked by At

Let's say I have an system running on Kubernetes using rolling updates or AWS ECS using blue/green deployments or any other solution which offers zero-downtime deployments. The key point here is that new and existing version can coexist and they use the same relational database.

We have a table in the database called users with a field username which we want to split in the new version into two separate fields: firstname and lastname. To do that, we change the service code accordingly and do migration of existing records. This scenario also caters for potential rollbacks. This scenario should also cater for potential rollbacks.

During the deployment we first run the migration then run the rolling update. As far as I understand, this can lead to situation, where the migration is done, but the service with the previous versions is still consuming traffic. That's why we do the migration in a way, that it supports working both with the current version and the new version. We add new fields, but not removing the old one yet. username field will be removed in a later release.

After deployment is finished, old version gets disabled and only new version is running, saving firstname and lastname separatelly. But here we end up in a situation, where if during the transition time a record was stored by the previous version, it will be stored in username field. These records are not subject of the db migraiton, because they were created after in was launched.

How to overcome that situation? Should the db migration (only the data migration part) be launched after the deployment or maybe there is a gap in the process described?

2

There are 2 best solutions below

0
On

You mentioned blue/green or other zero-downtime deployments so I am assuming that is the important piece here. Not that it can be done, but that it IS done with zero-outage.

  1. You have control of the new application, why not make it handle both migrated and not-yet-migrated data? Instead of migrating the data before swap, migrate after swap. Any new data gets inserted correctly while old data gets migrated and the app handle the missing data by splitting the username field at run-time. Then remove this version-handling code when you remove the column later. Possibly a bit more difficult but the goal is zero-outage and not the easiest path to done.

  2. If your database has insert triggers: after adding the columns to the table, you can add an insert trigger that fires when username column is inserted. The trigger itself would execute the same split logic as the migration and update the firstname and lastname columns when new data is inserted from the old app. The migration, which is updating (not inserting) will not fire the trigger, and the new app code will not include the username column, so it will also not fire the trigger. This would allow you to run the migration while the old app runs and then swap to the new app without needing the additional app code from the first method. Just remove the now unused trigger when you remove the now unused username column.

I think the important thing is, make sure your updates don't take excessive exclusive locks using proper batching techniques. Blocking read operations can quickly spiral into a huge queue of blocked processes that can bring your database server down ie. use small indexed update batches that are held very briefly. This could make the migration take longer, but who cares if the site is running just fine and any resulting read blocks are only held for milliseconds and the users don't notice.

1
On

This is actually a good question and one that needs to be architected into the manner in which you do the upgrade.

Most commonly this is worked around by locking out clients during the upgrade. One way is like this:

  1. Change the password of the DB user that the clients use, or use the maintenance mode if the client supports it (e.g. if it's a website they may have a mode that displays "This site is under temporary maintenance please try again later). Other sites just allow the site to go to a 5xx error during the upgrade.
  2. Perform the upgrade. Since the clients can no longer access it, you won't have any chance of a legit update
  3. Wait for the rollout to complete
  4. Test the new version (use the new credentials here)
  5. Change the password back to the original password
  6. Check you can still authenticate using the original credentials
  7. Remove the maintenance mode if enabled
  8. Check the clients are able to connect and monitor for any errors