We are trying to scale up OrientDB in distributed mode but are facing performance bottle neck.
Our use case
Data model and Processing methodology:
We have sub-graph message that we are reading from kafka and persisting/appending to the graph in database. A sub-graph message can multiple vertices/edges that are to be saved/updated in graph. These vertices/edges could or couldn’t exist with the database so we have to query database first they exists or not. If they exists, we have to check if any property is changed or not and have to perform a update operation accordingly. Vertices are of two types and have around 10 to 15 properties on them, whereas edges are of two types and have only 3 properties on them. Service consumes sub-graph messages from 16 kafka threads in parallel and performs read/update/create operations concurrently.
We are using Graph API (blueprints implementation of orient graph). Currently to process one sub-graph we make use of one graph transactional graph instance from the pool. The pool have around 250 instances which connects with database remotely. After processing all the required operations of update/create the transaction is committed only once.
Performance bottlenecks
We currently are able to process only about 15K sub-graph message per min with 16 threads and OrientDB v2.2.35 running in distributed mode.
Performance hit due to concurrency: While processing the sub-graph message on single thread it takes about 10ms (about 35 reads, 15 update and 20 create operations) but it increases exponentially when done concurrently to 20-45ms.
Horizontally scaling up orient database: A. Not able to run OrientDB v3.0.3 in distributed mode due to this bug : https://github.com/orientechnologies/orientdb/issues/8427 B. Tried with
OrientDB 2.2.35it worked. But withstandaloneOrientDB and concurrent calls it takes about20-45msto process sub-graph. Whereas, indistributed modeit takes about100-110 mson average with 2 server nodes both running as master. Though it is excepted to take more time aswriteQuromis set tomajority,but is there anything that could be done about it to reduces it somewhat.Slow edge creation: We have notice that edge creation is heavy operation with Graph API as it requires reference of both the vertices and thus involves read of these vertices as well to get references within active transaction.
Things we have tried till now:
Tuning Graph API
- Lookup on
indexesfor vertices and edges - Declaring
massive insertion intentsomewhat improvement on write but performance hit on reads as it disables the local cache. - Using well defined
schema. - Switching
off data validation: no noticeable improvement - Disabling
client level cachethough reduces JVM consumption but is important for performance - Disabling the
transactional logs: no noticeable improvement.
Tuning distributed database cluster:
- Using
Master-Replicamodel no noticeable improvement. Load balancing: With concurrent call to database and load balancing strategy set toROUND_ROBINwas getting lots of concurrent modification exceptions and thus causing increased re-tries.- Explored out of box
shardingbox but due its limitation of not being able to maintain unique indexes can’t have it in our use case.
Upsert to create edges
- We tried using upsert with edges but it seems to be broken and requires a fix at this point. ( https://github.com/orientechnologies/orientdb/issues/8424).