Finding the cumulative sum of the value of nodes in a DAG

742 Views Asked by At

Suppose I have the following directed acyclic graph (DAG) with each node having a weight of 1.

simple DAG

I am interested in calculating the accumulated sum of each node based on the value of its ancestor. Assuming as I said earlier that the weight of each node is 1, then this is what I would expect to get

cumulative sum per node

This is what I tried to do:

 library(tidygraph, quietly = TRUE) 
 library(tidyverse)
 library(ggraph)

 # create adjacencies
 grafo_df <- tribble(
  ~from, ~to,
  "C", "A",
  "C", "B",
  "A", "D",
  "B", "D")
 
 # create the graph
 grafo <- as_tbl_graph(grafo_df)
 
 
 # calculate accumulated sum
 grafo %>% 
  arrange(node_topo_order()) %>% 
  mutate(
   
   revenue = 1,
   
   cum_weight = map_dfs(1, .f = function(node, path, ...) {
    
    sum(.N()$revenue[c(node, path$node)])
    
   })) %>% 
  as_tibble() %>% 
  unnest("cum_weight")
 
#> # A tibble: 4 x 3
#>   name  revenue cum_weight
#>   <chr>   <dbl>      <dbl>
#> 1 C           1          1
#> 2 A           1          2
#> 3 B           1          2
#> 4 D           1          3

Created on 2021-05-13 by the reprex package (v2.0.0)

As you can see, the accumulated sum of D results in 3 and not 4, because the value of D should be the sum of the accumulated value of A and B. I do not understand why D does not add 4

I have tried to understand the solution given here, but had a hard time understanding it

How can I get the accumulated sum?

Update # 1

I am not concerned (for the moment) with the complexity of the algorithm, that is, if the algorithm does it in O(V + E) it is not relevant.

Something important that is mentioned in this question is about the problem of counting twice, that is, the partial sum of the value of A is equal to C(1) + A(1) = 2, and the partial sum of the value of B is equal to C(1) + B (1) = 2, so to say that the value of D is not equal to the partial sums of A (2) + B(2) because the value of C would be duplicating I think it does not apply in this situation due to the following:

Let's imagine that each of these 4 nodes (A, B, C and D) are internet nodes that generate revenue of $1 each, so the total accumulated income of the 4 nodes would be $4. If D is the convergence node of the rest of nodes, then in a scenario where D stops working, the income of the remaining nodes and that of D would no longer be possible, therefore, its value is $4.

Update # 2

If I add a new path from C to D then the value of D should always be 4 because the number of dependent nodes is maintained, that is, what should matter is the number of dependent nodes in the accumulated sum. For example, in the solution proposed by @ThomasIsCoding, if I add this new path, the value of D is now 5, I think partly that their algorithm uses the degrees as a parameter to calculate the cumulative sum, however, if I add a additional node then the calculation is correct.

Update # 3

The example that I have placed is simple with the intention that it is easy to understand the objective, however, I did not specify that it should be generalizable for a graph with many nodes with three different topologies. The outermost layers are trees, the middle layers are rings, and the innermost layer is a full mesh.

2

There are 2 best solutions below

9
On BEST ANSWER

Here is an igraph option using distance with argument mode = "in"

  • If your nodes are unweighted, i.e., revenue=1 for all nodes
g <- graph_from_data_frame(grafo_df)

data.frame(name = names(V(g))) %>%
  mutate(revenue = 1) %>%
  mutate(cum_weight = rowSums((!is.infinite(distances(g, mode = "in"))) %*% diag(revenue)))

which gives you

  name revenue cum_weight
1    C       1          1
2    A       1          3
3    B       1          2
4    F       1          1
5    D       1          5
  • If your nodes are weighted, e.g.,
data.frame(name = names(V(g))) %>%
  mutate(revenue = 1:n()) %>%
  mutate(cum_weight = rowSums((!is.infinite(distances(g, mode = "in"))) %*% diag(revenue)))

which gives you

  name revenue cum_weight
1    C       1          1
2    A       2          7
3    B       3          4
4    F       4          4
5    D       5         15

Data

grafo_df <- tribble(
  ~from, ~to,
  "C", "A",
  "C", "B",
  "A", "D",
  "C", "D",
  "B", "D",
  "F", "A"
)

and the DAG by plot(g) is given as

enter image description here

1
On

Now the question is clear, so I propose an algorithm, I cannot code it since I don't know the language that you are using.

For each node Ni in the graph we will calculate the set of ancestors Ai, then the accumulated sum for each node will be |Ai| + 1.

  1. Initialize all nodes with an empty ancestor set Ai = {}
  2. Start with a set S0 containing all nodes with no incoming edges
  3. Initialize the next set Sn+1
  4. Iterate over Sn, for each node N:
  5. For all nodes D with an incoming edge from N:
    1. Merge the ancestor set of D with the ancestor set of N plus N itself
    2. remove the egde N->D
  6. If D has no other incoming edges add it to Sn+1
  7. If Sn+1 is not empty, increase pass to n+1 and repeat from 2.

The big limit of this solution is the complexity, I'll try later to find some optimized solution.