I'm using LOAD CSV to import data from csv to neo4j. My dataset contains multiple values in the country field. Currently I'm using a semicolon as the separator of those multiple values.
nodes-person.csv
id,country
http://author,country1;country2;country3
And this is the cypher query which I use to import data into neo4j
LOAD CSV WITH HEADERS FROM "file:///nodes-person.csv" AS csvLine
MERGE (p:`person` {id: csvLine.id})
ON CREATE
SET
p.country = split(csvLine.country,";")
ON MATCH
SET
p.country = split(csvLine.country,";")
RETURN p;
My question is, how can I split the values properly if the values contain the separator character.
ie:
country\\;1 ; country\\;2 ; country\\;3
You've got a couple of options - one is pure Cypher and slightly untidy, the other is using APOC and regular expressions. I'm making the assumption that if the semicolon appears within a country name it's escaped with a single backslash.
Cypher route
The plan here is to do three replacements:
__SEMICOLON__)__SEMICOLON__instances with a semicolon characterSomething like the following would work (the
WITHis just so it's runnable in isolation):APOC and Regular Expressions
A tidier approach is to use
apoc.text.replaceand supply a regular expression as the 'separator', where we want to split the string by semicolons that are not preceded by the backslash escape character:We do a final tidy-up to replace the escaped semicolons with plain semicolons for storage with that list comprehension. The regex is shamelessly stolen from this answer.