how to iterate when calculating pagerank with mapreduce

2.8k Views Asked by At

I have questions when I'm trying to implementing PageRank with mapreduce. I want to cite the codes here https://stackoverflow.com/a/5029780/1117436 to describe the problem.

map ((url,PR), out_links) //PR = random at start
for link in out_links
  emit(link, ((PR/size(out_links)), url))

reduce(url, List[(weight, url)):
   PR =0
   for v in weights
   PR = PR + v
   Set urls = all urls from list

 emit((url, PR), urls)

In the above process, it's clearly that the second parameter of the input of map procedure is the Out links of url but the second parameter of the output of reduce procedure seems to be the In links of url. So how can these codes work iteratively?

Then what I want to ask is how to write codes to make the pagerank alrorithm work properly?

UPDATE: I think this answer solves my problem. https://stackoverflow.com/a/13568286/1117436

2

There are 2 best solutions below

1
On

There are already a couple of graph processing frameworks.

Look at Apache Giraph which can used for graph processing. Giraph is based on MR. GoldenOrb is in a very early stage. Also, take a look at Apache Hama which is an implementation of BSP, this has it's own computation engine and is not MR based, but uses HDFS for storage. Hama can also be used for graph processing.

0
On

You can implement iterative algorithms using MapReduce, but it might not be the best and more efficient way (because you move stuff to HDFS/disk each iteration).

Having said that, if you are interested in looking at how one might implement something like PageRank using MapReduce have a look here:

Start from the run() method in PageRank.java

If you are interested, you can have a look at a bunch of old (i.e. 2009) slides here:

Now, you can have much more fun at implementing/running PageRank with a Pregel clone such as Apache Giraph as Praveen already suggested to you.