Computing PageRank on a digraph with edge weights using GraphFrames

710 Views Asked by At

Assume I use GraphFrames to construct a digraph g with edge weights from the positive real numbers. I would then like to compute the PageRank with taking the edge weights into account. I don't see how this can be achieved by looking at the documentation for graphframes.GraphFrame.pageRank. Calling results = g.pageRank(resetProbability=0.15, maxIter=10) will compute the PageRank, but assuming edge weights of 1 as far as I can tell. Am I correct?

Compare this to networkx.algorithms.link_analysis.pagerank_alg.pagerank which allows for computing PageRank on a digraph with edge weights, see documentation.

Thanks for reading and any help is appreciated.

1

There are 1 best solutions below

0
On

I think that probably we can 'flatten' the data first.

val df = Seq((1,2,3),(2,3,4),(3,4,1)).toDF("src", "dst", "weight")
val getArray = udf[Seq[Int], Int] {x => (1 to x).toList.toSeq}
val flatDf = df \
             .withColumn("dummy1", getArray(col("weight"))) \
             .withColumn("dummy2", explode(col("dummy1"))).select("src", "dst")