I want to take a monthly / quarterly backup of both Hive metadata and Hive data at once for more than 1000 tables with easy restoring capability. So far, I found below options but not sure which is best for backing up Hive tables in production. Any tips ?
- Apache Falcon - http://saptak.in/writing/2015/08/11/mirroring-datasets-hadoop-clusters-apache-falcon
- Pro: Easily available as a service in Ambari for install
- Con: No community support
- Hortonworks Data flow - https://docs.hortonworks.com.s3.amazonaws.com/HDPDocuments/Ambari-2.7.4.0/bk_ambari-upgrade-major/content/prepare_hive_for_upgrade.html
- Pro: Latest
- Con: No much documentation to test. Please share any resources of how to backup with Hortonworks data flow
- Other ways - Hive data backup With Distcp, Export/Import, Snapshots and hive metadata backup using relational database dumps
- Con: Not sure if both Hive data and Hive metadata get backed-up at same time. Time-taking to implement a monthly / quarterly scheduler.