I'm using LOAD CSV
to import data from csv
to neo4j
. My dataset contains multiple values in the country
field. Currently I'm using a semicolon as the separator of those multiple values.
nodes-person.csv
id,country
http://author,country1;country2;country3
And this is the cypher query which I use to import data into neo4j
LOAD CSV WITH HEADERS FROM "file:///nodes-person.csv" AS csvLine
MERGE (p:`person` {id: csvLine.id})
ON CREATE
SET
p.country = split(csvLine.country,";")
ON MATCH
SET
p.country = split(csvLine.country,";")
RETURN p;
My question is, how can I split the values properly if the values contain the separator character.
ie:
country\\;1 ; country\\;2 ; country\\;3
You've got a couple of options - one is pure Cypher and slightly untidy, the other is using APOC and regular expressions. I'm making the assumption that if the semicolon appears within a country name it's escaped with a single backslash.
Cypher route
The plan here is to do three replacements:
__SEMICOLON__
)__SEMICOLON__
instances with a semicolon characterSomething like the following would work (the
WITH
is just so it's runnable in isolation):APOC and Regular Expressions
A tidier approach is to use
apoc.text.replace
and supply a regular expression as the 'separator', where we want to split the string by semicolons that are not preceded by the backslash escape character:We do a final tidy-up to replace the escaped semicolons with plain semicolons for storage with that list comprehension. The regex is shamelessly stolen from this answer.