I am designing a solution for a SQL Server 2012 database for the following scenario
- The database contains about 1M records with some simple parent child relationships between 4 or 5 tables
- There is a 24x7 high load of reads on the database
- Once a day we receive a batch with about 1000 inserts, updates and deletes that should be merged into the database, occasionally this number could be higher.
- Apart from the daily batch there are no other writers to the database
There are a few 'special' requirements
- Readers should not experience any significant latency due to these updates
- The entire batch should be processed atomically from the readers perspective. The reader should not see a partially processed batch
- If the update fails halfway we need to rollback all changes of the batch
- Processing of the batch itself is not time-critical, with a simple implementation it now takes up to a few minutes which is just fine.
The options I am thinking of are
Wrap a single database transaction around the entire update batch (this could be a large transaction), and using snapshot isolation to allow readers to read the original data while the update is running.
Use partition switching, It seems like this feature was designed with this kind of usecase in mind. The downside seems to be that before we can start processing the batch we need to create a copy of all the original data.
Switch the entire database. We could create a copy of the entire database, process the batch in this copy and then redirect all clients to this database(e.g. by changing their connection string). This should even allow us to make the database read only and possibly even create multiple copies of the database for scalability.
Which of these options, or another, would best fit this scenario and why?
partition switching is not really going to solve your solution as you should consider that the same as doing it against the database as you have it today... (so the rollback/insert) would still be blocking however it could be isolated to just part of your data not all...
Your best bet is to use 2 databases and switch connection strings...
OR use 1 database and have 2 sets of tables and use views or sprocs that are swapped to look at the "active" tables. You still could have disk contention issues but from a locking perspective you would be fine.