I need a mechanism to send data from node-red, to be stored in HDFS (Hadoop). I prefer the data to be streamed. I am thinking about using the 'websocket out' node to write the data to it and use a Flume agent to read.
I am new to node-red.
Could you please let know if I am in the right direction and clarify with some details if I am not? Any alternate approach should also be fine.
Update: node-red offers 'bluemixhdfs' node which is exclusively tied up with IBM bluemix whereas I am using only a vanilla hadoop.
I recently had the similar issue for a small project of mine. So I try to explain my approach.
A little background: In the application, I had to do some processing on real-time streaming data from different data sources. At the same time, I also needed to store the streaming data for future processing.
I used Apache Kafka message broker as an integration agent between Node-RED and HDFS (and also for Apache Spark Stream processing engine).
In Node-RED, I used Kafka node to publish streaming data from different data sources to separate topics in Kafka. Node-RED flow with Streaming data sources and Apache Kafka
HDFS Sink Connector, a Kafka Connect component, is then used to store the streaming data to the HDFS. Flow Architecture for Node-RED to HDFS and Spark Streaming using Kafka Message broker
This approach can also be adopted when many streaming data sources like IoT sensors, Stock market data, Social media data, weather api, etc. are to be connected as a single flow using Node-RED and then want to use HDFS for storing these data for further processing.