Performance issues while scaling up with OrientDB

748 Views Asked by At

We are trying to scale up OrientDB in distributed mode but are facing performance bottle neck.

Our use case

Data model and Processing methodology:

We have sub-graph message that we are reading from kafka and persisting/appending to the graph in database. A sub-graph message can multiple vertices/edges that are to be saved/updated in graph. These vertices/edges could or couldn’t exist with the database so we have to query database first they exists or not. If they exists, we have to check if any property is changed or not and have to perform a update operation accordingly. Vertices are of two types and have around 10 to 15 properties on them, whereas edges are of two types and have only 3 properties on them. Service consumes sub-graph messages from 16 kafka threads in parallel and performs read/update/create operations concurrently.

We are using Graph API (blueprints implementation of orient graph). Currently to process one sub-graph we make use of one graph transactional graph instance from the pool. The pool have around 250 instances which connects with database remotely. After processing all the required operations of update/create the transaction is committed only once.

Performance bottlenecks

We currently are able to process only about 15K sub-graph message per min with 16 threads and OrientDB v2.2.35 running in distributed mode.

  1. Performance hit due to concurrency: While processing the sub-graph message on single thread it takes about 10ms (about 35 reads, 15 update and 20 create operations) but it increases exponentially when done concurrently to 20-45ms.

  2. Horizontally scaling up orient database: A. Not able to run OrientDB v3.0.3 in distributed mode due to this bug : https://github.com/orientechnologies/orientdb/issues/8427 B. Tried with OrientDB 2.2.35 it worked. But with standalone OrientDB and concurrent calls it takes about 20-45ms to process sub-graph. Whereas, in distributed mode it takes about 100-110 ms on average with 2 server nodes both running as master. Though it is excepted to take more time as writeQurom is set to majority,but is there anything that could be done about it to reduces it somewhat.

  3. Slow edge creation: We have notice that edge creation is heavy operation with Graph API as it requires reference of both the vertices and thus involves read of these vertices as well to get references within active transaction.

Things we have tried till now:

Tuning Graph API

  1. Lookup on indexes for vertices and edges
  2. Declaring massive insertion intent somewhat improvement on write but performance hit on reads as it disables the local cache.
  3. Using well defined schema.
  4. Switching off data validation: no noticeable improvement
  5. Disabling client level cache though reduces JVM consumption but is important for performance
  6. Disabling the transactional logs: no noticeable improvement.

Tuning distributed database cluster:

  1. Using Master-Replica model no noticeable improvement.
  2. Load balancing: With concurrent call to database and load balancing strategy set to ROUND_ROBIN was getting lots of concurrent modification exceptions and thus causing increased re-tries.
  3. Explored out of box sharding box but due its limitation of not being able to maintain unique indexes can’t have it in our use case.

Upsert to create edges

  1. We tried using upsert with edges but it seems to be broken and requires a fix at this point. ( https://github.com/orientechnologies/orientdb/issues/8424).
0

There are 0 best solutions below