How do I migrate my database schema in a backwards-incompatible way with DataJoint?

88 Views Asked by At

I work with a neuroscience lab that uses DataJoint and a shared lab database to manage our data and data pipelines during data preprocessing and analysis. We have tables for different data types, both raw and processed, and as we use new hardware and analysis methods over the course of several years, we need to revise these schema to support these new ways of storing and working with data. Sometimes, we need to modify table definitions in a backwards-incompatible way, e.g., change the primary key, change relationships between tables, remove tables, or rename tables. However, we want to ensure that users who have started their data processing and analysis on a particular database schema and code can continue to do so. How do we do this in DataJoint?

Here is our current thinking:

One common strategy for backwards-incompatible schema migration is the Expand and Contract pattern (see also this ref).

Adapting that strategy, we would:

  1. Modify the existing schema name so that it is versioned, e.g., version 1 uses “common_ephys_v1” and create a new schema for version 2 “common_ephys_v2”.
  2. Create a branch in the code repo (“v1_v2”) where changes to the database are made in both the v1 and v2 schema. This will add some overhead in both code and transactions. Test the code thoroughly.
  3. Select a time period to pause all changes in the database. Create and run a script using custom SQL to copy data from the v1 schema into the v2 schema and apply the backwards-incompatible changes. Merge the “v1_v2” branch from step 2 into "main".
  4. Let users use this version of the codebase for some time and make sure data is being written correctly.
  5. Create another branch in the code repo (“v2”) and, in that branch, update the DataJoint classes to be compatible with only the new v2 schema.
  6. Tell users who do not need to use the old v1 database schema to adapt their custom code to use the new v2 code and schema.
  7. While there are users of both v1 and v2 code and schema, apply critical bug fixes to both the v1 and v2 classes in their respective branches (“main” and “v2”).
  8. Once all users of v1 have finished with their analysis on the v1 schema, delete the v1 schema and merge the “v2” branch from step 5 into "main".

This seems like a workable, albeit tedious, solution. Is there a better or standard way to do this in DataJoint?

0

There are 0 best solutions below