Is Apache Hama suitable for building a decision tree?

482 Views Asked by At

I currently have implemented in Hadoop, Google's framework for building decision trees (also known as PLANET). It starts with a single vertex and with map reduce jobs you add more and more until the tree is fully build. One major problem though is the fact that a lot of map/reduce jobs run one after another, so the cost of starting new jobs all the time is very high.

I have seen many times that Apache Hama is suitable for iterative algorithms like graphs. Can someone build a new graph with Hama or you just have as input a graph and make some computations on it? Will it be easy to transfer my project to Hama?? Thanks

1

There are 1 best solutions below

7
On

Hama is indeed able to construct a Decision Tree using the algorithm described in the PLANET paper, in a much more efficient way than MapReduce.

Hama does not need a graph as input, you can have a look at the Hama ML (machine learning) module that usually deals with raw feature vectors as input directly from HDFS.

For Hama I have created a new issue in the Apache Jira to track progress on this algorithm.