I have a rather complex web application which creates HTML pages in a Cassandra database.
While creating the page, it saves a status in that page reflecting the fact that it is being worked on.
INSERT INTO content (key, column1, value)
VALUES ('http://domain/path', 'content:status', 0x0201);
(The column names come from thrift days...)
While the status is 0x0201, no otherp process can do anything to the page. It is viewed as being locked.
Once done creating the page, with one ms or so, I switch the status to "normal". This is another insert of the content::status
field.
INSERT INTO content (key, column1, value)
VALUES ('http://domain/path', 'content:status', 0x0102);
Here the status changes from 0x0201
to 0x0102
. Only, out of about 700 pages that I create on a website initialization, that status does not change for 22 to 30 of them (3% to 4%).
Could this happen because the time it takes between the first INSERT INTO
and the second one is too short and the Cassandra cluster gets confused? (i.e. sees both as arriving pretty much together and it selects one of them, it just happens to be the wrong one in these few cases where it fails?)
When using the C++ driver (and others I'm sure), the two INSERT commands may end up being sent to two different pipelines. This is because the driver handles worker threads and the commands can end up in either one of the worker thread pipeline.
This means even if you send CREATE and later NORMAL, the thread pipelines may end up sending NORMAL first and then CREATE to Cassandra (i.e. swap the order in which the data was first sent to the C++ driver.) Then you end up with a status of CREATE...
This cannot be resolved directly. Instead, you may want to use a lock while doing work on that page, once the work is done, also update the status to NORMAL in case it was something else, then unlock. If you have a lock with a timeout, then you should never create a complete deadlock (i.e. one process handling page A then B without releasing the lock on page A before working on B, and another process first handling page B then A...)