I'm trying to run algorithms on Neo4j's Aura DS databases.

It seems like I've by and large understood how to connect to an Aura DS database, project a particular graph, then apply one of the algorithms from the graphdatascience (GDS) library in order to do node classification or solve some other machine learning problem.

However, can I somehow connect to an Aura DS database and retrieve the data in a format like pandas dataframe/tensor/numpy array/etc. and use other libraries besides GDS to train?

Apologies if this is trivial. I've tried searching for this, but got no satisfactory answer.

1

There are 1 best solutions below

0
On

Aura DS and Aura DB support Cypher from python drivers. You could fetch the properties of the nodes and generate a dataframe from this.

The minor challenge here is Pandas DFs are not as forgiving as Neo4j Graphs when it comes to the schema.
So it is a better approach to fetch a list of properties from a known node and then generate a RETURN Cypher query based on the result.
Alternatively, you could just hardcode specific parameters and aliases in the 'nodesQuery' Cypher statement. But this could get tedious if you have too many columns/properties to fetch.

from neo4j import GraphDatabase # pip install neo4j
import pandas as pd #pip install pandas

AuraDBId = 'ENTER DB ID HERE'
dbUsername = 'ENTER USERNAME HERE' #Default neo4j for Aura
password = 'ENTER YOUR PASSWORD HERE'
boltUrl = f"neo4j+ssc://{AuraDBId}.databases.neo4j.io:7687"     
graphDBDriver = GraphDatabase.driver(boltUrl,auth=(dbUsername, password))
graphDBDriver.verify_connectivity()

#Fetching properties and generating the List of columns
nodeV = 'n'
yourLabel = 'Movie' #Replace this with your actual label

columnsQuery=f"MATCH({nodeV}:{yourLabel}) RETURN keys(n) LIMIT 1" # Add a WHERE clause to filter this node if required. The properties of this node are going to be used as a reference for the columns
with graphDBDriver.session() as session:
    propertiesRes = session.run(columnsQuery)
    propertiesList = propertiesRes.data() # Returs a List with 1 dictionary
cols = propertiesList[0].get(f'keys({nodeV})')
returnString = ', '.join([f'{nodeV}.{col} as {col}' for col in cols]) #Generating a return statement with aliases 

nodesQuery = f"MATCH({nodeV}:{yourLabel}) RETURN {returnString}"
with graphDBDriver.session() as session:
    nodesRes = session.run(nodesQuery)
    nodeResList = nodesRes.data()

graphDF = pd.DataFrame(nodeResList) 

graphDBDriver.close()

Ensure you change AuraDBId, dbUsername , password and yourLabel as applicable. Once you have a dataframe in Python, it should be business as usual.

Caveats:

  1. This approach of hardcoding DB credentials in your code is simple and short, but not recommended. Always store your credentials separately and load them into the actual code utilizing them. in a .env file and use loadenv() or store as a json file and use json.load to read the contents.
  2. Neo4j's recommended approach for interacting with the Graph from an application driver is using managed transactions and parameterized queries in the session. I have skipped these because the use case seems simple and rare enough. But if your workload is heavier, always use the transaction and parameterized approaches.