I fail to see how the serializable isolation level can be implemented without locking the entire tables in question - something that appears to be possible as all material on the subject I could find talks about row and range locks.
I found an interesting example in a discussion of the serializable and Sql Server's snapshot isolation levels here.
The example is about having two marbles, one black, one white, and two transactions that concurrently change all black marbles into whites and vice versa. My question is: How does the serializable isolation level prevent an outcome where the end result is again one white and one black marble?
I've tried the following on SQL Server:
create table marbles (id int primary key, color char(5))
insert marbles values(1, 'White')
insert marbles values(2, 'Black')
I now start two transaction, one being
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
-- returns 1
SELECT id FROM marbles m WHERE m.color = 'White';
UPDATE marbles SET color = 'Black' WHERE id = 1;
COMMIT
the other being the analogous:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
-- returns 2
SELECT id FROM marbles m WHERE m.color = 'Black';
UPDATE marbles SET color = 'White' WHERE id = 2;
COMMIT
The selects are meant to resemble reads by a programmatic client that uses the id to later issue the respective update.
If I run both transactions until after the select statement first and then let them continue, they deadlock. This is what we would wish to happen (or at least it's better than the result of mixed colors).
I've tried to change the updates into setting the colors to green and yellow, respectively - and the transactions still deadlock, and this time they really shouldn't. If the white marble is changed to yellow, it would never have been selected by the transaction that was interested in the black marbles - so why the deadlock?
In order to implement serializable properly (locking in exactly the necessary cases) one would have to keep a record of the selects of a transaction in such a way that the database engine could figure how updates of other transactions affect those result sets.
I could see that in simple cases such as in this example or simple range queries, but in the general case it strikes me as a very complicated problem.
What I find puzzling in this regard is that no source I found on the topic mentions this problem. They all just state something to the effect of saying "the serializable isolation level is implemented with clever locking, see range locks" and suggest that this is it. But that's not nearly the whole story, is it?
In particular, when a database engine claims to "support serializable" it's not at all clear what that might be: It could be simple locking of everything on each transaction and it's probably never going to be the other extreme of locking only what is really necessary, as that is too complex to implement.
In contrast, it's clear what exactly Sql Server's snapshot isolation level really means.
I'd be interested if I see that correctly and if so, whether people have pointers to further reads on the subject, especially regarding SQL Server.
[EDIT: A similar question in the postgres world had been asked here.]
[EDIT2: The postgres wiki also has a discussion on this very example here.]
Gives output
The plan shows a clustered index scan. There are two rows in the table but three key range locks are taken out.
There is no suitable index for the
WHERE
clause to seek in oncolor
so the select query effectively ends up locking the whole table. The last lock shown blocks the range up to "infinity" (ffffffffffff
).Even if there was a suitable index on "color" and the execution plan used it the exact range locked depends upon what other rows exist in the table (including any deleted "ghost" records not yet cleaned up)
A more detailed explanation is available here