Renaming Anytree Parent and Child Name

370 Views Asked by At

I have a dataset as follows

Unique Name Parent Child
US_SQ A A1
UC_LC A A2
UK_SJ A2 A21
UI_QQ B B1

Now I want to set the output as follows:

US_SQ
├── A1
└── UC_LC
    └── UK_SJ
UI_QQ
└── B1

In other words, I want to use the Unique name column value in the tree.

This is the code that I am using:

def add_nodes(nodes, parent, child):
    if parent not in nodes:
        nodes[parent] = Node(parent)  
    if child not in nodes:
        nodes[child] = Node(child)
    nodes[child].parent = nodes[parent]

data = pd.DataFrame(columns=["Parent","Child"], data=[["US_SQ","A","A1"],["UC_LC","A","A2"],["UK_SJ","A2","A21"],["UI_QQ","B","B1"]])
nodes = {}  # store references to created nodes 
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1)  # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
    add_nodes(nodes, parent, child)

roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots:         # you can skip this for roots[0], if there is no forest and just 1 tree
    for pre, _, node in RenderTree(nodes[root]):
        print("%s%s" % (pre, node.name))

Also, is there a way to access the tree data efficiently/ is there any format to save the tree data so that we can easily find the parent/child node easily?

The above data and problem is used from here:

Read data from a pandas DataFrame and create a tree using anytree in python

1

There are 1 best solutions below

4
On BEST ANSWER

There are two parts to your question.

1. Renaming the Node

Regarding renaming the node by using Unique Name as the alias for Parent name, the above answer on aliasDict is good but we can modify the DataFrame directly instead, leaving your code unchanged.

I have modified your DataFrame because it does not seem to run properly, and your code example does not clearly show that Unique Name is an alias for Parent in some cases.

data = pd.DataFrame(
    columns=["Unique Name", "Parent", "Child"],
    data=[
        ["US_SQ", "A", "A1"],
        ["US_SQ", "A", "A2"],
        ["UC_LC", "A2", "A21"],
        ["UI_QQ", "B", "B1"]
    ]
)

# Rename Parent and Child columns using aliasDict
aliasDict = dict(data[["Parent", "Unique Name"]].values)
data["Parent"] = data["Parent"].replace(aliasDict)
data["Child"] = data["Child"].replace(aliasDict)

# Your original code - unchanged
nodes = {}
for parent, child in zip(data["Parent"],data["Child"]):
    add_nodes(nodes, parent, child)

2. Exporting to DataFrame

In the second part, anyTree does not provide integration with pandas DataFrame. An alternative bigtree Python package does this out-of-the-box for you.

The whole code example can be implemented as such,

import pandas as pd
from bigtree import dataframe_to_tree_by_relation, print_tree, tree_to_dataframe

data = pd.DataFrame(
    columns=["Unique Name", "Parent", "Child"],
    data=[
        ["root", "root", "A"],  # added this line
        ["root", "root", "B"],  # added this line
        ["US_SQ", "A", "A1"],
        ["US_SQ", "A", "A2"],
        ["UC_LC", "A2", "A21"],
        ["UI_QQ", "B", "B1"]
    ]
)

# Rename Parent and Child columns using aliasDict (same as above)
aliasDict = dict(data[["Parent", "Unique Name"]].values)
data["Parent"] = data["Parent"].replace(aliasDict)
data["Child"] = data["Child"].replace(aliasDict)

# Create a tree from dataframe, print the tree
root = dataframe_to_tree_by_relation(data, parent_col="Parent", child_col="Child")
print_tree(root)
# root
# ├── US_SQ
# │   ├── A1
# │   └── UC_LC
# │       └── A21
# └── UI_QQ
#     └── B1

# Export tree to dataframe
tree_to_dataframe(root, parent_col="Parent", name_col="Child")
#                     path  Child Parent
# 0                  /root   root   None
# 1            /root/US_SQ  US_SQ   root
# 2         /root/US_SQ/A1     A1  US_SQ
# 3      /root/US_SQ/UC_LC  UC_LC  US_SQ
# 4  /root/US_SQ/UC_LC/A21    A21  UC_LC
# 5            /root/UI_QQ  UI_QQ   root
# 6         /root/UI_QQ/B1     B1  UI_QQ

Source: I'm the creator of bigtree ;)