About crashing during a 2hop query operation in Neo4j

51 Views Asked by At

I have a question about how to deal with Neo4j queries that are slow and then crash.

First of all, let me explain the graphs. There are 20million nodes called "Party". Each Party node has properties "id" and "type". Each Party is connected to one or more "Account" nodes by a relationship called "HAS", with a total number of Accounts of 70million. The relationships between Accounts are connected by a relationship called "TRANSACTION", with a total number of 350million. TRANSACTION has the properties "timestamp", "price".

Now, to calculate the impact of a Party whose type property is on (say P1) on a Party 2hop away (a Party whose type property is off, say P3), the following two-stage query is performed.

CALL apoc.periodic.iterate("

// P1-A1-A2
MATCH (P1:Party{type:'on'})-[:HAS]-(A1:Account)-[t1:TRANSACTION]-(A2:Account)<-[:HAS]-(P2:Party{type:'off'})
     CASE
         WHEN datetime(t1.timestamp) >= datetime(reference_date) THEN 4

...(Sorry, it's too long to write, but there is a complex case split that depends on values of t1.timestamp and t1.price.)

// A2-A3-P3
MATCH (A2)-[t2:TRANSACTION]-(A3:Account)<-[:HAS]-(P3:Party{type:'off'})
     CASE
         WHEN datetime(t2.timestamp) >= datetime(reference_date) THEN 4

...(This one also has a complex case split depending on the values of t2.timestamp and t2.price. Finally, the value of score is calculated.)

RETURN
     P3.id, score
  "CREATE (temp:TempResult {id: id, score: score})",
  {batchSize:100, parallel:true}
)

The CPU usage is running at around 30% at the beginning of the calculation, but finally the CPU usage reaches 100% and the operation becomes very slow. Eventually, the connection to the Neo4j Browser database is broken down.

What do you think is the reason for the disconnection from the database?

  1. Is it because the 2hop algorithm is inefficient? (Does it use a lot of memory? Or do you get a lot of disk accesses?)
  2. It is said that indexes make queries work more efficiently, but it is not well understood how they should be created.
  3. Is the size of the batch process processed at one time not appropriate? (batchSize:100)
  4. Are there not enough machine specs? Also, I am not sure that the appropriate values for heap and cache settings.

The server and Neo4j specifications are as follows.

Server

  • OS: Windows Server 2019
  • CPU: Intel Xeon Platinum 2.90GHz 4core
  • Memory: 512GB
  • HDD: 1TB

Neo4j

  • Version: Enterprise v5.12.0
  • server.memory.heap.initial_size=240g
  • server.memory.heap.max_size=240g
  • server.memory.pagecache.size=240g
  • dbms.memory.transaction.total.max=400G
  • db.transaction.timeout=1440m

I would be very grateful for any advice from any knowledgeable person. Any information would be much appreciated. Thank you in advance for your kind attention.

enter image description here

0

There are 0 best solutions below