failover is not fired when active name node crashes

Question

failover is not fired when active name node crashes

1.2k Views Asked by oula alshiekh At 28 July 2025 at 13:32

I am using Apache Hadoop-2.7.1 on cluster that consists of three nodes

nn1 master name node

nn2 (second name node)

dn1 (data node)

i have configured high availability,and nameservice and zookeeper is working in all three nodes
and it is started on nn2 as leader

first of all i have to mention that nn1 is active and nn2 is stand by

when i kill name node on nn1

,nn2 becomes active so automatic fail over is happening

but with the following scenario (which i apply when nn1 is active and nn2 is standby)and which is :

when i turn off nn1 (nn1 whole crashing)

nn2 stay stand by and doesn't become active so automatic failover is not happening

with noticeable error in log

Unable to trigger a roll of the active NN(which was nn1 and now it is closed ofcourse)

shouldn't automatic fail over happens with two existing journal nodes on nn2 and dn1

and what could be possible reasons ?

Original Q&A

There are 2 best solutions below

kbolino On 11 May 2021 at 22:10

This appears to be due to a bug in the sshfence fencing method, identified as HADOOP-15684, fixed in 3.0.4, 3.1.2, and 3.2.0 as well as backported to 2.10.0 via HDFS-14397.

**oula alshiekh** · Accepted Answer

my problem was solved by altering dfs.ha.fencing.methods in hdfs-site.xml

to include not only ssh fencing but also another shell fencing method that

returns always true

<name>dfs.ha.fencing.methods</name>
<value>sshfence
       shell(/bin/true)
</value>

automatic failover will fail if fencing fails, i specified two options, the second one( shell(/bin/true)) always returns success. This is done for workaround cases where the primary NameNode machine goes down and the ssh method will fail, and no failover will be performed. We want to avoid this, so the second option would be to failover anyway

you can find details here https://www.packtpub.com/books/content/setting-namenode-ha

failover is not fired when active name node crashes

There are 2 best solutions below

Related Questions in HADOOP

Related Questions in HTTPFS

Trending Questions

Popular # Hahtags

Popular Questions