How can I connect apache Nutch 2.x to a remote HBase cluster?

698 Views Asked by zahid adeel At 27 March 2014 at 05:22

I have two machines. One machine runs HBase 0.92.2 in pseudo-distributed mode, while the other one is using Nutch 2.x crawler. How can I configure these two machines so that one machine with HBase-0.92.2 acts as back end storage and the other with Nutch-2.x acts as a crawler?

Original Q&A

There are 1 best solutions below

zahid adeel On 28 March 2014 at 05:12 BEST ANSWER

I finally did it.I was easy to do. i am sharing my experience here. May be it can help someone.

1- change the configuration file of hbase-site.xml for pseudo distributed mode.

2- MOST IMPORTANT THING: on hbase machine, replace localhost ip in /etc/hosts with your real network ip like this

10.11.22.189 master localhost

hbase machine's ip = 10.11.22.189 (note: if you won't change your hbase machine's localhost ip, remote nutch crawler won't be able to connect to it)

4- copy/symlink hbase-site.xml into $NUTCH_HOME/conf

5- start your crawler and see it working

How can I connect apache Nutch 2.x to a remote HBase cluster?

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in HBASE

Related Questions in APACHE-ZOOKEEPER

Related Questions in NUTCH

Related Questions in NUTCH2

Trending Questions

Popular # Hahtags

Popular Questions