Flink 1.1.3 Interact with Hive 2.1.0

2.3k Views Asked by At

Excuse me for the inconvenience but I did not find an answer in the Doc or Internet.

I have a platform with :

  • Hadoop 2.7.3
  • Hive 2.1.0
  • Hbase 1.2.4
  • Spark 1.6

I have integrated Flink 1.1.3 to use it on local mode and Yarn mode.

I'm interested to use Flink with Hive (As hiveContext with Spark) to read data in scala-shell, is it possible ? And How ?

Regards.

2

There are 2 best solutions below

0
On

From Flink 1.9.0, we are officially supporting Flink with Hive. https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/

are you still looking into this option?

0
On

Flink does not support direct connections to Hive as it is supported in Spark with SQL context. But there is a simple way of analyzing data in Hive table in Flink using Flink Table API

What you need to do is first get the exact HDFS location of the Hive table you wish to analyze with Flink e.g.

hdfs://app/hive/warehouse/mydb/mytable

Then you read data

DataSet<Record> csvInput = env
            .readCsvFile("hdfs://app/hive/warehouse/mydb/mytable/data.csv")
            .pojoType(MyClass.class, "col1", "col2", "col3");

Then you need to create a table from the DataSet and then register it with the TableEnvironment

Table mytable = tableEnv.fromDataSet(csvInput);
tableEnv.registerTable("mytable", mytable );

And now you are all set to query this table using Table API syntax.

Here is a link to the sample code.

Hope this helps.