Nifi data movement between SQL databases to HDFS and Hive

60 Views Asked by Ricardo Serrano At 08 October 2023 at 02:21

In Nifi I have flows that copy information from SQL databases (SQL Server, MySQL, Oracle, etc.) to HIVE, where after depositing the .avro files in HDFS and defining an .avsc (manually), I build an external table (used as temp table) to finally copy that table´s data to a final table managed by HIVE stored as parquet and with snappy compression.

At the moment my flows are like the following:

GenerateFlowFile -> 
DeleteHDFS (delete avro files in HDFS from previous flows execution) -> 
ExecuteSQL (get new avro data) -> 
PutHDFS (avro files to HDFS) -> 
ReplaceText (empty the flowfile data) -> 
MergeContent (e.g., 28 flowfiles generated by ExecuteSQL are merged into 1 flowfile) -> 
ReplaceText (DROP AND CREATE EXTERNAL TABLE - pointing to avro data and avsc file) -> 
PutHiveQL -> 
ReplaceText (INSERT INTO FINAL_TABLE SELECT * FROM EXTERNAL_TABLE) -> 
PutHiveQL

I would like to know if this approach is the right one for moving data between these environments or if there is a better way to do it (more effective and efficient).

I'd like to reduce the number of processors in my Nifi dataflows and if possible, improve the efficiency in data movement.

Original Q&A

Nifi data movement between SQL databases to HDFS and Hive

There are 0 best solutions below

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in HDFS

Related Questions in APACHE-NIFI

Related Questions in CLOUDERA

Trending Questions

Popular # Hahtags

Popular Questions