Network/tree data: calculate number of independent trees and average maximum edges per independent tree

42 Views Asked by At

I would like to draw network plots using tidygraph and ggraph.

I have a larger tibble with items connected via from and to. Some of the trees are connected (a0 and b0 in the example).

I would like to:

  1. Count the number of independent trees
  2. Calculate the average maximum edges=connections per independent tree. The average maximum edges should be calculated "downstreams", i.e. from a0 to k2 or a4 not a0 to b0 in the example data.

Example:

library(tidygraph)
library(igraph)
library(ggraph)
library(tidyverse)


# make edges
edges<- tibble(from = c("a0","a1","a2","a3","b0","b1","c0","c1","a2","k1"),
               to = c("a1","a2","a3","a4","b1","a3","c1","c2","k1","k2"))


# makenodes
nodes  <- unique(c(edges$from,edges$to))
tibble(node=nodes,
       label=nodes) -> nodes


# make correct dataframe                 
routes_igraph <- graph_from_data_frame(d = edges,
                                       vertices = nodes,
                                       directed = TRUE)

routes_tidy <- as_tbl_graph(routes_igraph)

#plot network
ggraph(routes_tidy, layout = "tree") + 
  geom_edge_link() + 
  geom_node_point() + 
  theme_graph() +
  geom_node_text(aes(label = label), repel = TRUE)

Created on 2023-04-16 with reprex v2.0.2

Desired output

  1. Number of independent trees of the given edges and nodes: 2

  2. Average maximum edges per independen trees: 3.5, 2

1

There are 1 best solutions below

2
Rui Barradas On BEST ANSWER

Here is a way. It borrows a function height from this SO post, modified to count "in" vertices.

height <- function(v, g) {
  D <- distances(g, to=v, mode="in")
  max(D[D != Inf])
}

cmp <- components(routes_igraph)
sp <- split(names(cmp$membership), cmp$membership)
sub_tree_list <- lapply(sp, \(v) induced.subgraph(routes_igraph, v))
sub_tree_height <- Map(\(g, v) sapply(v, height, g = g), sub_tree_list, sp)

# number of components
length(sp)
#> [1] 2

# height of each sub-tree
sapply(sub_tree_height, max)
#> 1 2 
#> 4 2

Created on 2023-04-16 with reprex v2.0.2


Edit

To get the maxima per initial node and their averages per sub-tree, this works.

initials_list <- lapply(sp, \(x) x[grep("0", x)])
sub_tree_max_height <- Map(\(g, v) sapply(v, height, g = g), sub_tree_list, initials_list)
sapply(sub_tree_max_height, mean)
#>   1   2 
#> 3.5 2.0

Created on 2023-04-16 with reprex v2.0.2