I have a highly normalized database with many foreign keys, and my application performs calculations on this data. First my application loads numerous entities of respective customer (let’s say by customer_id). During the data load phase, which can take up to 15 minutes, it's possible for rows in linked tables to be deleted by another service, leading to inconsistent data and crashing my process.
A colleague has suggested wrapping the load phase in a transaction and setting the isolation level to SERIALIZABLE. While this seems like a potential solution, I'm concerned about potential drawbacks.
What are the potential drawbacks of using a SERIALIZABLE isolation level for this scenario?
Are there better alternatives or best practices to handle this issue and maintain data consistency during the load phase?
EDIT:
I would like to shed more light on the scenario. At any given time, my database is indeed consistent and complete.Let's break it up: Given DB with tables A, B, C (some tables has 50-80M rows)
- My application loads all rows of customer X (NOT in transaction):
1.1. Load N rows from table A (5M rows)
1.2. Load N rows from table B (10K rows)
1.3. Load N rows from table C (8M rows)
- Within 1.1 and 1.2 some other process delete rows in table A, B (in transaction or via triggers 'ON DELETE CASCADE')
Point it that after this deletion process the data is complete, BUT, my process already loads rows from table A that are obsolete. The inside my application when it finds dangling row it crashes.
The suggested idea is to wrap 1.1...1.3 in transaction so the data is ALWAYS complete from the application perspective.
The
SERIALIZABLEisolation level doesn't keep you from getting foreign key violation errors. If anything, you will get more errors, because any possible anomaly will lead to a serialization error, which forces you to repeat the whole transaction.There is no way you can prevent problems if a concurrent transaction deletes data that the data you want to insert depend on.
The only way forward that I see is to suspend any concurrent conflicting deletions while you load data. If you cannot guarantee that from the application, you could use database techniques: put a
SELECT ... FOR KEY SHARElock on all the data you depend on to prevent concurrent sessions from deleting them. That forces you to perform the whole data load in a single transaction, which is not nice, considering the length of the procedure. But if those data loads don't happen all the time, it may well be acceptable.