I have a directory in HDFS, where .csv files with fixed structure and column names will be dumped at the end of every day that may look like this:

I have a hive table that should have new data appended to it, at the beginning of every day, with data from .csv of previous day's .csv file. How do i accomplish this.
hive - how to automatically append data to hive table every day?
343 Views Asked by Naveen Reddy Marthala At
2
There are 2 best solutions below
0
leftjoin
On
Build Hive table on top of that directory in HDFS. After new files will be dumped in table location, select from that table will pick new files. I'd suggest to change the process which dumps files to write into date subfolders and create partitioned table by date. All you need after this is to run recover partitions command before selecting table.
Related Questions in DATABASE
- When dealing with databases, does adding a different table when we can use a simple hash a good thing?
- How to not load all database records in my TListbox in Firemonkey Delphi XE8
- microsoft odbc driver manager data source name not found and no default driver specified
- Cloud Connection with Java Window application
- Automatic background scan if user edit column?
- Jmeter JDBC Connection Configuration Parametrization of Database URL for accessing SQL Database
- How to grant privileges to current user
- MySQL: Insert a new row at a specific primary key, or alternately, bump all subsequent rows down?
- Inserting and returning autoidentity in SQLite3
- Architecture: Multiple Mongo databases+connections vs multiple collections with Express
- SQL - Adding a flag based on results within a query - best practice?
- Android database query not returning any results
- Developing a search and tag heavy website
- Oracle stored procedure wrapping compile error with inline comments
- Problems communicating with mysql in php
Related Questions in HIVE
- How do I set the Hive user to something different than the Spark user from within a Spark program?
- schedule and automate sqoop import/export tasks
- PIG merge two lines in the log
- Elephant bird with hive to query protobuf file
- How can we decide the total no. of buckets for a hive table
- How to create a table in Hive with a column of data type array<map<string, string>>
- How to find number of unique connection using hive/pig
- sqoop-export is failing when I have \N as data
- How can we test expressions in hive
- Run Hive Query in R with Config
- Rhive: The messages shows: Not Connected to Hiveserver2 (But can connect HDFS)
- HIVE Query Deleting source data blob
- Hive JOIN of query with subquery takes forever
- What is Metadata DB Derby?
- How could I set the number or size of output files in an "insert" script?
Related Questions in HDFS
- Using HDFS with Apache Spark on Amazon EC2
- How to read a CSV file from HDFS via Hadoopy?
- How can I migrate data from one HDFS cluster to another over the network?
- Spark/Spark Streaming in production without HDFS
- Jcascalog to query thrift data on HDFS
- What is Metadata DB Derby?
- Can Solr or ElasticSearch be configured to use HDFS as their persistence layer in a way that also supports MapReduce?
- How to import only new data by using Sqoop?
- How to access hdfs by URI consisting of H/A namenodes in Spark which is outer hadoop cluster?
- Force HDFS globStatus to skip directories it doesn't have permissions to
- Trying to use WinInet to upload a file to HDFS
- Apache Spark architecture
- Is possible to set hadoop blocksize 24 MB?
- Unable to create file using Pail DFS
- Hadoop Distributed File Systmes
Related Questions in HIVEQL
- How do I set the Hive user to something different than the Spark user from within a Spark program?
- Hive Queries are not executing. Showing Exceptions in CLI
- Issue on Azure DocumentDB and Hive integrations on HDInsight using Microsoft Hive ODBC driver
- How can I get the max count in a Hive joined query
- Creating a hive table with ~40K columns
- Remove duplicate rows by counts in Hive SQL?
- Sum over a date range per group in Hive
- Configuration values for hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode in HIVE
- Hive external table not reading entirety of string from CSV source
- SQL to HiveQL in Hive Hadoop
- Can you explain when and why mapreduce is invoked in hive
- Hive - Hashtag Counting
- Error while executing select query in Hive - how to update Hadoop version
- INSERT in table Hive
- Is it possible in PIG to create column field by defining column field value
Related Questions in HIVE-TABLE
- Hive External Table - Drop Partition
- How to read a table which is saved by saveAsTable in Apache Spark?
- Schema on read in hive for tsv format file
- ALTER TABLE table ADD IF NOT EXISTS PARTITION (state = '34' , city = '123') is not adding the partition in temp/local folder
- Hive Table is MANAGED or EXTERNAL - issue post table type conversion
- Hbase Tables not created in EMR cluster using Hive-Hbase Integration
- Can we check the hive deleted table history or detailed information around it?
- Hive External table on AVRO file producing only NULL data for all columns
- Cannot Create table with spark SQL : Hive support is required to CREATE Hive TABLE (AS SELECT);
- Create Sqoop Hive Import Job
- Spark DataFrame ORC Hive table reading issue
- Hive - create hive table from specific data of three csv files in hdfs
- Hive - create an internal table from three external tables
- Add retention period to hive tables
- changing hive external table to internal table in the same database also drops the data from the another table
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
I can suggest to use CRON Jobs. You create a script that update the tables, and you configure a CRON job to execute that script each at a specific time of the day (for your case the beginning of the day), and then the tables will get updated automatically.
PS: this solution can be applied only if you're having your server in production, I mean the CRON job should be used in a server that's running 24/24, else, you should use Anacron.