the-file-name.csv is on the previous question How to make the Cypher request to handle those both cases without duplicating Nodes
On the first step I do
CREATE CONSTRAINT ON (r:Region) ASSERT r.region IS UNIQUE;
CREATE CONSTRAINT ON (c:City) ASSERT c.cityName IS UNIQUE;
CREATE CONSTRAINT ON (s:Sector) ASSERT s.sectorName IS UNIQUE;
Here I put the cypher query.
LOAD CSV WITH HEADERS FROM "file:///the-file-name.csv" as line
FIELDTERMINATOR ','
with line as line
OPTIONAL MATCH (n:Region)
WHERE n.region contains "BLANKEMPTYVIDE" AND n.identifier= line.CODE_TER
WITH n, line
CALL apoc.do.when(
n IS NULL,
'MERGE (r:Region{region: line.TERRITOIRE}) ON CREATE SET r.description=line.TERRITOIRE ON CREATE SET r.identifier=line.CODE_TER ON CREATE SET r.id = toString(id(r)) RETURN r',
'RETURN n AS r',
{n: n, line: line}) YIELD value
WITH value.r AS r, line
SET r.identifier=line.CODE_TER
SET r.description=line.TERRITOIRE
SET r.region=line.TERRITOIRE
WITH r,line
OPTIONAL MATCH (ci:City) where ci.cityName contains "BLANKEMPTYVIDE" AND ci.regionIdentifier= line.CODE_TER
CALL apoc.do.when(
ci IS NULL,
'MERGE (c:City {cityName:line.BRICK}) ON CREATE SET c.identifier=line.CODE_BRICK ON CREATE SET c.region=line.TERRITOIRE ON CREATE SET c.regionIdentifier=line.CODE_TER ON CREATE SET c.zip=line.CODE_BRICK ON CREATE SET c.description=line.BRICK ON CREATE SET c.id = toString(id(c)) RETURN c',
'RETURN ci AS c',
{ci: ci, line: line}) YIELD value
WITH value.c AS c,line,r
SET c.identifier=line.CODE_BRICK
SET c.region=line.TERRITOIRE
SET c.regionIdentifier=line.CODE_TER
SET c.zip=line.CODE_BRICK
SET c.description=line.BRICK
SET c.cityName=line.BRICK
WITH c,r,line
MATCH (c {identifier:line.CODE_BRICK}),(r {identifier:line.CODE_TER})
MERGE (c)-[:IS_A_City_BELONGING_TO]->(r)
WITH c,r,line
OPTIONAL MATCH (sec:Sector)
WHERE sec.description contains "BLANKEMPTYVIDE" AND sec.regionIdentifier=line.CODE_TER
CALL apoc.do.when(
sec IS NULL,
'MERGE (s:Sector {sectorName:line.SOUSBRICK}) ON CREATE SET s.identifier=line.CODE_SBRICK ON CREATE SET s.region=line.TERRITOIRE ON CREATE SET s.regionIdentifier=line.CODE_TER ON CREATE SET s.city=line.BRICK ON CREATE SET s.cityIdentifier=line.CODE_BRICK ON CREATE SET s.description=line.SOUSBRICK ON CREATE SET s.zip=line.SOUSBRICK ON CREATE SET s.id = toString(id(s)) RETURN s',
'RETURN sec AS s',
{sec: sec, line: line}) YIELD value
WITH value.s AS s,line,c,r
SET s.identifier=line.CODE_SBRICK
SET s.region=line.TERRITOIRE
SET s.regionIdentifier=line.CODE_TER
SET s.city=line.BRICK
SET s.cityIdentifier=line.CODE_BRICK
SET s.description=line.SOUSBRICK
SET s.zip=line.SOUSBRICK
SET s.sectorName = line.SOUSBRICK
WITH s,c,r,line
MATCH (s {identifier:line.CODE_SBRICK}),(r{identifier:line.CODE_TER})
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_REGION]->(r)
WITH s,c,r,line
MATCH (s {identifier:line.CODE_SBRICK}),(c{identifier:line.CODE_BRICK})
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_CITY]->(c)
The result :
Set 28271 properties, created 3400 relationships, completed after 34985 ms.
It took nearby 35 sec to load csv to neo4j db. please is it right perfect in performance view ?
for a csv file containing 1665 csv lines each line contains 6 fields : two first fields for (Region) the third and fourth (City) and the fifth and sixth for (Sector).
How to optimize this neo4j query so that it takes just few seconds below 3 seconds ?
Here below I put the profiler result picture.
Here I made the update upon advice from InverseFalcon
LOAD CSV WITH HEADERS FROM "file:///the-file-name.csv" as line
FIELDTERMINATOR ','
with line as line
OPTIONAL MATCH (n:Region)
WHERE n.region contains "BLANKEMPTYVIDE" AND n.identifier= line.CODE_TER
WITH n, line
CALL apoc.do.when(
n IS NULL,
'MERGE (r:Region{region: line.TERRITOIRE}) ON CREATE SET r.description=line.TERRITOIRE ON CREATE SET r.identifier=line.CODE_TER ON CREATE SET r.id = toString(id(r)) RETURN r',
'SET n.identifier=line.CODE_TER SET n.description=line.TERRITOIRE SET n.region=line.TERRITOIRE RETURN n AS r',
{n: n, line: line}) YIELD value
WITH value.r AS r, line
OPTIONAL MATCH (ci:City) where ci.cityName contains "BLANKEMPTYVIDE" AND ci.regionIdentifier= line.CODE_TER
CALL apoc.do.when(
ci IS NULL,
'MERGE (c:City {cityName:line.BRICK}) ON CREATE SET c.identifier=line.CODE_BRICK ON CREATE SET c.region=line.TERRITOIRE ON CREATE SET c.regionIdentifier=line.CODE_TER ON CREATE SET c.zip=line.CODE_BRICK ON CREATE SET c.description=line.BRICK ON CREATE SET c.id = toString(id(c)) RETURN c',
'SET ci.identifier=line.CODE_BRICK SET ci.region=line.TERRITOIRE SET ci.regionIdentifier=line.CODE_TER SET ci.zip=line.CODE_BRICK SET ci.description=line.BRICK SET ci.cityName=line.BRICK RETURN ci AS c',
{ci: ci, line: line}) YIELD value
WITH value.c AS c,line,r
MERGE (c)-[:IS_A_City_BELONGING_TO]->(r)
WITH c,r,line
OPTIONAL MATCH (sec:Sector)
WHERE sec.description contains "BLANKEMPTYVIDE" AND sec.regionIdentifier=line.CODE_TER
CALL apoc.do.when(
sec IS NULL,
'MERGE (s:Sector {sectorName:line.SOUSBRICK}) ON CREATE SET s.identifier=line.CODE_SBRICK ON CREATE SET s.region=line.TERRITOIRE ON CREATE SET s.regionIdentifier=line.CODE_TER ON CREATE SET s.city=line.BRICK ON CREATE SET s.cityIdentifier=line.CODE_BRICK ON CREATE SET s.description=line.SOUSBRICK ON CREATE SET s.zip=line.SOUSBRICK ON CREATE SET s.id = toString(id(s)) RETURN s',
'SET sec.identifier=line.CODE_SBRICK SET sec.region=line.TERRITOIRE SET sec.regionIdentifier=line.CODE_TER SET sec.city=line.BRICK SET sec.cityIdentifier=line.CODE_BRICK SET sec.description=line.SOUSBRICK SET sec.zip=line.SOUSBRICK SET sec.sectorName = line.SOUSBRICK RETURN sec AS s',
{sec: sec, line: line}) YIELD value
WITH value.s AS s,line,c,r
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_REGION]->(r)
WITH s,c,r,line
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_CITY]->(c)
added index on description for sector constrains becomes
CREATE CONSTRAINT ON (r:Region) ASSERT r.region IS UNIQUE;
CREATE CONSTRAINT ON (c:City) ASSERT c.cityName IS UNIQUE;
CREATE CONSTRAINT ON (s:Sector) ASSERT s.sectorName IS UNIQUE;
CREATE CONSTRAINT ON (s:Sector) ASSERT s.description IS UNIQUE;
Plan from profiler
Cypher version: CYPHER 3.4, planner: COST, runtime: INTERPRETED. 175350 total db hits in 14321 ms****
Directely without Profiler Created 3403 relationships, completed after 12702 ms.
Good enhancement from 36 sec to 12 sec but still need optimization.
The operations with the largest DB hits reference a NodeByLabelScan followed by a filter, yielding only 1663 rows or results from the starting 1.3 million or so nodes of the label. This seems like an opportunity to add an index, if possible.
You haven't expanded the nodes in the query plan, so we can't see any info on which part of the query those db hits are associated with, but if I were to guess it would be from here:
I don't see any evidence you've created a constraint or index on :Sector(description) or :Sector(regionIdentifier), so this is probably where the planner was forced to use a NodeByLabelScan, which as you can see is expensive. You're going to want an index on one of these, or a unique constraint (depending on if any of these properties is unique to :Sector).
Separately, you're doing a curious thing in several places throughout your query that looks like this:
The variables you're using in your MATCH are already bound, so I would suggest using a WHERE clause for these instead to make it clear that this is meant to be a filtering step:
That said, in all of these cases you've already explicitly set those properties to the line properties in question, so there's no need to perform this kind of filtering anyway, it's unnecessary. I'd suggest removing those matches and just MERGE the relationships.