OpenRDF Sesame: how to handle locking?

323 Views Asked by At

On my Apache Tomcat server I have an OpenRDF Sesame triplestore to handle RDF triples related to users and documents and bidirectional links between such entities:

http://local/id/doc/123456 myvocabulary:title "EU Economy"
http://local/id/doc/456789 myvocabulary:title "United States Economy"

http://local/id/user/JohnDoe myvocabulary:email "[email protected]"
http://local/id/user/JohnDoe myvocabylary:hasWritten http://local/id/doc/123456

This triple state that user John Doe with email "[email protected]" has written "EU Economy" book.

A Java application running on multiples clients used such server through an HTTPRespository to insert/update/remove such triples.

Problems comes from concurrent connections. If a Java Client delete the book "456789" and an other Client simultaneously link the same book to "John Doe", then there may have a situation that "John Doe" links to a book that doesn't exists any more.

To try to find a solution I have made two transactions. The first one is (T1):

  • (a) Check if book id exists (i.e. "456789").

  • (b) If yes, link the given profile (i.e. "JohnDoe") to this book.

  • (c) If no, return an error.

The second one is (T2):

  • (d) Delete book by id (i.e. "456789").

The problem is if the sequence is (T1,a) (T2,d) (T1,b) (T1,c), there is again consistency issues.

My question is: how to handle locking (like MySQL FOR UPDATE or GET_LOCK) to properly isolate such transactions with sesame ?

1

There are 1 best solutions below

0
On BEST ANSWER

Older versions of Sesame (2.7.x and older) support no transaction isolation over HTTP. In a HTTP connection, transactions merely batch operations together at the client side, but no lock is obtained from the server, so there is no way to control isolation in this scenario.

So the only way to deal with this in older Sesame versions is to be robust in your queries, rather than relying on full data consistency (which is a bit of an odd concept in a schemaless/semi-structured data paradigm anyway). For example in this particular case, make sure that when you query for the books linked to a profile, the book data is actually there - don't just rely on the reference.

In Sesame 2.8 and newer, however, full transaction isolation support is available over HTTP, and additional control over the exact transaction isolation level is available as well, on a per-transaction basis. The locking scheme is dependent on the specific triplestore implementation you use.

Sesame's native store uses optimistic locking, which means that it assumes a transaction to be able to make the update it wants to, and throws an exception when a conflict occurs. Setting the isolation level for a transaction controls how the store handles locking for concurrent transactions. The Sesame Programmers manual has more details on transaction handling and the available isolation levels. The default isolation level for transactions on the native store is SNAPSHOT_READ.

As for your example transactions: in the default isolation level, T1 and T2 both observe consistent snapshots of the store for their queries, and the sequence as you sketch it plays out: T1 sees the book exists, thus adds it to the profile, and T2 gets to delete it. The end result will be that the profile is linked to a non-existent book - but actually, this is not technically an inconsistency, because T2 does not do any verification on whether a particular book is used in a profile, or not. No matter which transaction isolation level you use, if in your scenario T2 gets executed after T1, the end result will be a link to a non-existent book. If you want to ensure that you cannot get into that situation, you need to extend T2 to do a check that the book about to be deleted is not linked to a profile, and make the isolation level SNAPSHOT or SERIALIZABLE.