create AWS Neptune graph from raw csv

793 Views Asked by At

I saw a lot of tutorials about how to load csv (Gremlin) data in the format of vertices and edges into AWS Neptune. For a lot of reasons, I cannot create vertices and edges for data loading. Instead I have just the raw csv file where each row is a record (e.g. a person).

How can I create nodes and relationships from each row of record from the raw csv in Neptune from the notebook interface?

1

There are 1 best solutions below

0
On

Given you mentioned wanting to do this in the notebooks, the examples below are all run from inside a Jupyter notebook. I don't have the data sets you mentioned to hand, so let's make a simple one in a Notebook cell using.

%%bash
echo "code,city,region
AUS,Austin,US-TX
JFK,New York,US-NY" > test.csv

We can then generate the openCypher CREATE steps for the nodes contained in that CSV file using a simple cell such as:

import csv
with open('test.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile, escapechar="\\")
    query = ""
    for row in reader:
        s = "CREATE (:Airport {"
        for k in row:
            s += f'{k}:"{row[k]}", '
        s = s[:-2] + '})\n'
        query += s 
    print(query)

Which yields

CREATE (:Airport {code:"AUS", city:"Austin", region:"US-TX"})
CREATE (:Airport {code:"JFK", city:"New York", region:"US-NY"})

Finally let's have the notebook oc cell magic run that query for us

ipython = get_ipython()
magic = ipython.run_cell_magic
magic(magic_name = "oc", line='', cell=query)

To verify that the query worked

%%oc
MATCH (a:Airport)
RETURN a.code, a.city

which returns:

    a.code     a.city
1   AUS        Austin
2   JFK        New York

There are many ways you could do this, but this is a simple way if you want to stay inside the notebooks. Given your question does not have a lot of detail or an example of what you have tried so far, hopefully this gives you some pointers.