How can i send data from node-red to Hadoop?

1k Views Asked by At

I need a mechanism to send data from node-red, to be stored in HDFS (Hadoop). I prefer the data to be streamed. I am thinking about using the 'websocket out' node to write the data to it and use a Flume agent to read.

I am new to node-red.

Could you please let know if I am in the right direction and clarify with some details if I am not? Any alternate approach should also be fine.

Update: node-red offers 'bluemixhdfs' node which is exclusively tied up with IBM bluemix whereas I am using only a vanilla hadoop.

2

There are 2 best solutions below

0
On

I recently had the similar issue for a small project of mine. So I try to explain my approach.

A little background: In the application, I had to do some processing on real-time streaming data from different data sources. At the same time, I also needed to store the streaming data for future processing.

I used Apache Kafka message broker as an integration agent between Node-RED and HDFS (and also for Apache Spark Stream processing engine).

In Node-RED, I used Kafka node to publish streaming data from different data sources to separate topics in Kafka. Node-RED flow with Streaming data sources and Apache Kafka

HDFS Sink Connector, a Kafka Connect component, is then used to store the streaming data to the HDFS. Flow Architecture for Node-RED to HDFS and Spark Streaming using Kafka Message broker

This approach can also be adopted when many streaming data sources like IoT sensors, Stock market data, Social media data, weather api, etc. are to be connected as a single flow using Node-RED and then want to use HDFS for storing these data for further processing.

0
On

I'm afraid that I'm not a Hadoop expert and so probably can't provide an answer directly. However it looks like Kafka supports websockets and this should be reasonably performant.

Depending on your architecture though, you should pay some attention to websocket security. Unless NR and Hadoop are both on a private secured network, websockets may be tricky to secure properly.

I think that websocket performance would be reasonable as long as the data size per transaction isn't too large (kb rather than Gb). You will need to do some testing though as there are too many factors influencing the performance of Node-RED to easily predict whether it will have the performance you require.

Node-RED supports a great many types of connectivity so if websockets don't work in your architecture, there are plenty of others such as UNIX pipes, TCP or UDP connections.