I'm working with Allen Brain's mouse RNA-seq data, and from the dend.json file provided I want to create a dictionary where the key is a parent node, and the value would be the nodes the parent node splits into or leads to. You can see the dendrogram here.

The dictionary from loading the json file looks like this:

{'node_attributes': [{'height': 0.8416,
   'members': 290,
   'edgePar.col': '#000000',
   'edgePar.lwd': 2,
   'edgePar.conf': 1,
   'label': '',
   'midpoint': 256.4472,
   'cell_set_accession': 'CS1910120323',
   'cell_set_alias': '',
   'cell_set_designation': 'Neuron/Non-Neuron',
   'X': '291',
   'node_id': 'n1'}],
 'children': [{'node_attributes': [{'height': 0.6271,
     'members': 279,
     'edgePar.col': '#000000',
     'edgePar.lwd': 2,
     'edgePar.conf': 1,
     'label': '',
     'midpoint': 226.7537,
     'cell_set_accession': 'CS1910120324',
     'cell_set_alias': '',
     'cell_set_designation': 'Neuron/Non-Neuron',
     'X': '292',
     'node_id': 'n2'}],
   'children': [{'node_attributes': [{'height': 0.365,
       'members': 271,
       'edgePar.col': '#000000',
       'edgePar.lwd': 2,
       'edgePar.conf': 1,
       'label': '',
       'midpoint': 178.695,
       'cell_set_accession': 'CS1910120325',
       'cell_set_alias': '',
       'cell_set_designation': 'Neuron 001-271',
       'X': '293',
       'node_id': 'n3'}],............

and dictionary['children'][0] follows a left split, and if there are two splits at a node, dictionary['children'][1] follows a right split.

I want the form of the output to be something like:

{n1 : [n2, n281],
 n2 : [n3, n284],...}

At the moment, I'm just able to parse the dictionary and return the nodes using code adapted from another post:

def walk(d):

    for k,v in d.items():
        if isinstance(v, str) or isinstance(v, int) or isinstance(v, float):
            if k == 'node_id':
                print('node:', v)
        elif isinstance(v, list):
            for v_int in range(len(v)):
                walk(v[v_int])

walk(dend)

Output:
node: n1
node: n2
node: n3
node: n4
node: n183
node: n184
node: n185
1

There are 1 best solutions below

2
On BEST ANSWER

This might be close to what you want.

https://github.com/danielsf/AllenInstTools_by_SFD/blob/master/parse_dendrogram.py

It creates a class CellNode that stores, for each node in the dendrogram, the name (the cell_set_accession) of the node, as well as lists of the names for all of the ancestors, children (immediate children) and ultimate children (all nodes descended from the current node) in the tree. The method build_tree will return a dict keyed on the cell_set_accession, whose values are the CellNode for that node.

If you don't like using cell_set_accession as the name for the nodes, you can change that at line 120 of the script.

If you want more or less information in your dict, you can identify leaf nodes because they will return empty lists for node.children.

The code was good enough for my purposes (which is a nice way of saying I haven't rigorously tested it). Feel free to reach out if something doesn't work as expected.