Nifi data movement between SQL databases to HDFS and Hive

60 Views Asked by At

In Nifi I have flows that copy information from SQL databases (SQL Server, MySQL, Oracle, etc.) to HIVE, where after depositing the .avro files in HDFS and defining an .avsc (manually), I build an external table (used as temp table) to finally copy that table´s data to a final table managed by HIVE stored as parquet and with snappy compression.

At the moment my flows are like the following:

GenerateFlowFile -> 
DeleteHDFS (delete avro files in HDFS from previous flows execution) -> 
ExecuteSQL (get new avro data) -> 
PutHDFS (avro files to HDFS) -> 
ReplaceText (empty the flowfile data) -> 
MergeContent (e.g., 28 flowfiles generated by ExecuteSQL are merged into 1 flowfile) -> 
ReplaceText (DROP AND CREATE EXTERNAL TABLE - pointing to avro data and avsc file) -> 
PutHiveQL -> 
ReplaceText (INSERT INTO FINAL_TABLE SELECT * FROM EXTERNAL_TABLE) -> 
PutHiveQL

I would like to know if this approach is the right one for moving data between these environments or if there is a better way to do it (more effective and efficient).

I'd like to reduce the number of processors in my Nifi dataflows and if possible, improve the efficiency in data movement.

0

There are 0 best solutions below