I have a directory in HDFS, where .csv
files with fixed structure and column names will be dumped at the end of every day that may look like this:
I have a hive table that should have new data appended to it, at the beginning of every day, with data from .csv
of previous day's .csv
file. How do i accomplish this.
hive - how to automatically append data to hive table every day?
324 Views Asked by Naveen Reddy Marthala At
2
There are 2 best solutions below
0

I can suggest to use CRON Jobs. You create a script that update the tables, and you configure a CRON job to execute that script each at a specific time of the day (for your case the beginning of the day), and then the tables will get updated automatically.
PS: this solution can be applied only if you're having your server in production, I mean the CRON job should be used in a server that's running 24/24, else, you should use Anacron.
Build Hive table on top of that directory in HDFS. After new files will be dumped in table location, select from that table will pick new files. I'd suggest to change the process which dumps files to write into date subfolders and create partitioned table by date. All you need after this is to run recover partitions command before selecting table.