I have a directed graph structure in my database model, and need to load many ( > 10.000) records. The database is not on the same server as the code, causing a network delay of about 2 ms per query.
My database model is similar to https://docs.sqlalchemy.org/en/20/_modules/examples/graphs/directed_graph.html
I want to load a list of nodes and for each node I need all edges.
This snippet is more ore less the code I would like to execute.
node_list = [1, 2, 3, ... , 1000]
for node in session.query(Node).filter(Node.id.in_(node_list)).all():
print(f'{node} , {node.higher_neighbors()} , {node.lower_neighbors()}')
For 1000 records, this results in 1 query to load the 1000 nodes, and then 2 queries for each node to load the neighbors. (2N + 1) queries in total. With a cost of 2 ms per query, this code takes about 4 seconds more then nessesary when loading 1000 records.
How should I change the model and/or query to load this data in a single, or a few queries?
This is my actual code. I originally used an association proxy, but this makes no difference in the amount of queries compared to the SQLAlchemy graph example. A postgres database is used.
class Job(Base):
__tablename__ = 'job'
id: Mapped[int] = mapped_column(primary_key=True)
job_name: Mapped[str] = mapped_column(String(256))
parents = association_proxy("job_parents", "parent")
children = association_proxy("job_children", "child")
logs: Mapped[List[JobLog]] = relationship(order_by=TaskLog.id)
def add_child(self, child, relation):
JobRelation(parent=self, child=child, relation=relation)
def add_parent(self, parent, relation):
JobRelation(parent=parent, child=self, relation=relation)
class JobRelation(Base):
__tablename__ = 'job_relation'
parent_id: Mapped[int] = mapped_column(ForeignKey('job.id'), primary_key=True)
child_id: Mapped[int] = mapped_column(ForeignKey('job.id'), primary_key=True)
relation: Mapped[str] = mapped_column(String(256))
parent = relationship(Job, foreign_keys=parent_id, backref="job_children")
child = relationship(Job, foreign_keys=child_id, backref="job_parents")
def __init__(self, parent: Job, child: Job, relation: str):
self.parent = parent
self.child = child
self.relation = relation
job_list = json['job_list']
data = {}
for job in session.query(Job).filter(job.id.in_(job_list)).all():
data[job.id] = {
'children' = [child.id for child in job.children]
'job_name' = job.job_name
'logs' = job.logs
'parents' = [parent.id for parent in job.parents]