GraphFrames detect exclusive outbound relations

610 Views Asked by At

In my graph I need to detect vertices that do not have inbound relations. Using the example below, "a" is the only node that is not being related by the anyone.

a -->  b
b -->  c
c -->  d
c -->  b

I would really appreciate any examples to detect "a" type nodes in my graph.

Thanks

1

There are 1 best solutions below

0
On

unfortunately the approach is not as simple because the graph.degress, graph.inDegrees, graph.outDegrees functions are not returning vertices with 0 edges. (see documentation for Scala which holds true for Python too https://graphframes.github.io/graphframes/docs/_site/api/scala/index.html#org.graphframes.GraphFrame)

so the following code will always return a empty dataframe

g=Graph(vertices,edges)

# check for start points 
g.inDegrees.filter("inDegree==0").show()
+---+--------+
| id|inDegree|
+---+--------+
+---+--------+

# or check for end points 
g.outDegrees.filter("outDegree==0").show()
+---+---------+
| id|outDegree|
+---+---------+
+---+---------+

# or check for any vertices that are alone without edge
g.degrees.filter("degree==0").show()
+---+------+
| id|degree|
+---+------+
+---+------+

what works is a left, right or full join of the inDegree and outDegree result and filter on the NULL values of the respective column

the join will provide you a merged columns with NULL values on the start and end positions

g.inDegrees.join(g.outDegrees,on="id",how="full").show()

+---+--------+---------+
| id|inDegree|outDegree|
+---+--------+---------+
| b6|       1|     null|
| a3|       1|        1|
| a4|       1|     null|
| c7|       1|        1|
| b2|       1|        2|
| c9|       3|        1|
| c5|       1|        1|
| c1|    null|        1|
| c6|       1|        1|
| a2|       1|        1|
| b3|       1|        1|
| b1|    null|        1|
| c8|       3|     null|
| a1|    null|        1|
| c4|       1|        4|
| c3|       1|        1|
| b4|       1|        1|
| c2|       1|        3|
|c10|       1|     null|
| b5|       2|        1|
+---+--------+---------+

now you can filter on what search

my_in_Degrees=g.inDegrees
my_out_Degrees=g.outDegrees

# get starting vertices (no more childs)
my_in_Degrees.join(my_out_Degrees,on="id",how="full").filter(my_in_Degrees.inDegree.isNull()).show()
+---+--------+---------+
| id|inDegree|outDegree|
+---+--------+---------+
| c1|    null|        1|
| b1|    null|        1|
| a1|    null|        1|
+---+--------+---------+


# get ending vertices (no more parents)
my_in_Degrees.join(my_out_Degrees,on="id",how="full").filter(my_out_Degrees.outDegree.isNull()).show()
+---+--------+---------+
| id|inDegree|outDegree|
+---+--------+---------+
| b6|       1|     null|
| a4|       1|     null|
|c10|       1|     null|
+---+--------+---------+