How to track HDFS Active Namenode change event in NiFi?

828 Views Asked by At

I have HDFS cluster with Active and Stanby Namenodes. Sometimes when cluster gets restarted, Namenodes exchange their roles - Standby becomes Active, and vice versa.

Then I have NiFi flow with PutParquet processor writing some files to this HDFS cluster. Processor is configured with directory property as "hdfs://${namenode}/some/path", where ${namenode} variable value is like "first.namenode.host.com:8020".

Now, when cluster gets restarted and actual Namenode gets changed to "second.namenode.host.com:8020", configuration in NiFi is not updated and processor still tries to use old namenode address, and thus some exception is thrown (I don't remember actual error text, but I think it doesn't matter for my question).

And now the question is: how can I track this event in NiFi, to automatically update PutParqet processor configuration when HDFS configuration changed?

NiFi version is 1.6.0, HDFS version is 2.6.0-cdh5.8.3

3

There are 3 best solutions below

0
On BEST ANSWER

I haven't confirmed this, but I thought with HA HDFS (Active and Standby NNs), you'd have the HA properties set in your *-site.xml files (probably core-site.xml) and would refer to the "cluster name" which the Hadoop client will then resolve into a list of NameNodes, which it would then try to connect to. If that's the case, then try the cluster name (see the core-site.xml file on the cluster) rather than a hardcoded NN address.

2
On

Two things that you could do:

  • If you know the IP address or hostname of the two name nodes, you can try this: Connect the failure relationship of PutParquet and connect it to either UpdateAttribute and change the directory value if you're using NiFi expressions for Directory property or another PutParquet processor with the directory value configured with the standby name node.
  • You could use PutHDFS but I'm not sure if PutParquet offers better performance over PutHDFS
0
On

Seems I have solved my problem. But that was not a "problem" at all :) here is the solution: httpfs error Operation category READ is not supported in state standby.

I had not to track event of changing active namenode manually within NiFi, instead of this I just had to configure my Hadoop client properly with core-site.xml to force it to get actual namenode automatically.

So the solution is just to set property "fs.defaultFS" in core-site.xml to the value of property "dfs.nameservices" from hdfs-site.xml (in my case "fs.defaultFS" in core-site.xml pointed to the actual host of active namenode - "first.namenode.host.com:8020").

I say "seems" because I have not tested this solution yet. But using this approach I can write to HDFS cluster without setting active hanemode address anywhere in NiFi. I just set it to use some "nameservice" rather then actual address, so I think if actual address changes - probably this does not affect NiFi, and Hadoop client handles this event.

Later I'm going to test it.

Thanks to @mattyb for an idea!