First call to openrdf-sesame update endpoint it very slow. Is there a way to optimize it?

403 Views Asked by At

We have some python scripts to execute both sparql queries and "updates" (an insert/delete). Here is most of the relevant code (I think):

server = "localhost"
repo = "test"
query_endpoint  = "http://%s:8080/openrdf-sesame/repositories/%s" % (server,repo)
update_endpoint = "http://%s:8080/openrdf-sesame/repositories/%s/statements" % (server,repo)


def execute_query(query):
  params = { 'query': query }
  headers = {
    'content-type': 'application/x-www-form-urlencoded',
    'accept': 'application/sparql-results+json'
  }
  (response, content) = httplib2.Http().request(endpoint, 'POST', urllib.urlencode(params),headers=headers)
  return (response,ast.literal_eval(content))

def execute_update(query):
  params = { 'update': query }
  headers = {
    'content-type': 'application/x-www-form-urlencoded',
    'accept': 'application/sparql-results+json'
  }
  (response, content) = httplib2.Http().request(update_endpoint, 'POST', urllib.urlencode(params),headers=headers)
  return True

All of our calls to execute_query are very fast, less than 1 second to complete. However, any call to execute_update takes a really long time to (16 seconds) the first time. Every call after the first one runs in less than 1 second. We're running sesame version 2.7.12 (we thought upgrading from sesame version 2.7.3 might help, but it didn't much). We only have 2 or 3 thousand triples. This is all running from CGI scripts so we can't really just keep a python session alive to make update calls (anyway isn't that workbench's job?). Any ideas on what is taking so long on that first call to the update_endpoint? Are other people having the same issue? Any suggested resolutions?

Thanks!

EDIT I followed RobV's advice, but I'm still having the same problem. Log files from tshark:

 22.577578   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test HTTP/1.1 
 22.578261   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 22.583422   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 200 OK  (application/sparql-results+json)
 22.583857   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 22.591122   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 22.591388   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.020398   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content 
 35.025605   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 35.025911   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.040606   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content 
 35.045937   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 35.046080   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.049359   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content 
 35.053776   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 35.053875   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.056937   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content 

You can see the large gap on the first call to the /statements endpoint.

2

There are 2 best solutions below

3
On BEST ANSWER

When we created the repository we created it as a "In Memory Store" repository. I created a new repository of the "Native Java Store" type, and now my first call is fast (as all are subsequent calls).

2
On

The Sesame workbench and the server are two different applications running in separate application contexts within your web application container.

Your CGI code directs queries directly to the Sesame server but directs updates to the Sesame workbench.

Sesame workbench is actually just a UI for the Sesame server and essentially proxies your requests on to the underlying Sesame server. The first time you make an update the Workbench has to establish a connection to the server which I believe involves making various additional requests to the Sesame server for metadata. After this the connection is cached by the workbench which is why subsequent updates run very fast.

Updates can be directed against the Sesame server directly by changing your update endpoint to use the Sesame server /statements endpoint instead as detailed in the Sesame HTTP Protocol documentation e.g.

update_endpoint = "http://%s:8080/openrdf-sesame/repositories/%s/statements" % (server,repo)

By going directly against the Sesame server you should eliminate the long delay on the first update.