I'm working with Allen Brain's mouse RNA-seq data, and from the dend.json file provided I want to create a dictionary where the key is a parent node, and the value would be the nodes the parent node splits into or leads to. You can see the dendrogram here.
The dictionary from loading the json file looks like this:
{'node_attributes': [{'height': 0.8416,
'members': 290,
'edgePar.col': '#000000',
'edgePar.lwd': 2,
'edgePar.conf': 1,
'label': '',
'midpoint': 256.4472,
'cell_set_accession': 'CS1910120323',
'cell_set_alias': '',
'cell_set_designation': 'Neuron/Non-Neuron',
'X': '291',
'node_id': 'n1'}],
'children': [{'node_attributes': [{'height': 0.6271,
'members': 279,
'edgePar.col': '#000000',
'edgePar.lwd': 2,
'edgePar.conf': 1,
'label': '',
'midpoint': 226.7537,
'cell_set_accession': 'CS1910120324',
'cell_set_alias': '',
'cell_set_designation': 'Neuron/Non-Neuron',
'X': '292',
'node_id': 'n2'}],
'children': [{'node_attributes': [{'height': 0.365,
'members': 271,
'edgePar.col': '#000000',
'edgePar.lwd': 2,
'edgePar.conf': 1,
'label': '',
'midpoint': 178.695,
'cell_set_accession': 'CS1910120325',
'cell_set_alias': '',
'cell_set_designation': 'Neuron 001-271',
'X': '293',
'node_id': 'n3'}],............
and dictionary['children'][0]
follows a left split, and if there are two splits at a node, dictionary['children'][1]
follows a right split.
I want the form of the output to be something like:
{n1 : [n2, n281],
n2 : [n3, n284],...}
At the moment, I'm just able to parse the dictionary and return the nodes using code adapted from another post:
def walk(d):
for k,v in d.items():
if isinstance(v, str) or isinstance(v, int) or isinstance(v, float):
if k == 'node_id':
print('node:', v)
elif isinstance(v, list):
for v_int in range(len(v)):
walk(v[v_int])
walk(dend)
Output:
node: n1
node: n2
node: n3
node: n4
node: n183
node: n184
node: n185
This might be close to what you want.
https://github.com/danielsf/AllenInstTools_by_SFD/blob/master/parse_dendrogram.py
It creates a class
CellNode
that stores, for each node in the dendrogram, the name (thecell_set_accession
) of the node, as well as lists of the names for all of the ancestors, children (immediate children) and ultimate children (all nodes descended from the current node) in the tree. The methodbuild_tree
will return a dict keyed on thecell_set_accession
, whose values are theCellNode
for that node.If you don't like using
cell_set_accession
as the name for the nodes, you can change that at line 120 of the script.If you want more or less information in your dict, you can identify leaf nodes because they will return empty lists for
node.children
.The code was good enough for my purposes (which is a nice way of saying I haven't rigorously tested it). Feel free to reach out if something doesn't work as expected.