hive - how to automatically append data to hive table every day?

324 Views Asked by At

I have a directory in HDFS, where .csv files with fixed structure and column names will be dumped at the end of every day that may look like this:
enter image description here
I have a hive table that should have new data appended to it, at the beginning of every day, with data from .csv of previous day's .csv file. How do i accomplish this.

2

There are 2 best solutions below

0
On

Build Hive table on top of that directory in HDFS. After new files will be dumped in table location, select from that table will pick new files. I'd suggest to change the process which dumps files to write into date subfolders and create partitioned table by date. All you need after this is to run recover partitions command before selecting table.

0
On

I can suggest to use CRON Jobs. You create a script that update the tables, and you configure a CRON job to execute that script each at a specific time of the day (for your case the beginning of the day), and then the tables will get updated automatically.

PS: this solution can be applied only if you're having your server in production, I mean the CRON job should be used in a server that's running 24/24, else, you should use Anacron.