Automated insertion from Hive to Elasticsearch

238 Views Asked by At

I currently try to find a way to automatically add data from Hadoop text files into elasticsearch. We are running HIVE v0.11, Hadoop v2.0.5, Elasticsearch 1.7.1 and elasticsearch-hadoop v2.1.0 The files are stored in different subfolders below the path /tmp/test-log/apache2log named year/month/day This table creation works in acquire the data from Hadoop:

CREATE EXTERNAL TABLE apache2log(
userIP STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
thread STRING,
link STRING,
callerInformation STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED by '|'
LOCATION '/tmp/test-log/apache2log';

But when I try to create a table, that inserts this data in elasticsearch, the creation works fine, but the table is empty. I tried the following command:

CREATE EXTERNAL TABLE apache2log(
userIP STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
thread STRING,
link STRING,
callerInformation STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED by '|'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
LOCATION '/tmp/test-log/apache2log'
TBLPROPERTIES(
'es.nodes'='1.2.3.4', 
'es.resource'='sam3/apache2',
'es.net.proxy.http.use.system.props'='false');

Variables that are changed from the Defaultsetting:

SET hive.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories = true;
SET hive.supports.subdirectories=true;
SET mapred.input.dir.recursive = true;
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

ADD JAR /usr/lib/gphd/hive-0.11.0_gphd_2_1_1_0/lib/elasticsearch-hadoop-2.1.0.jar;

I know, there would be the possibility to create a second table for elasticsearch and adding the data using INSERT. But I need the process to be automated, so data added to the files should be inserted into table the time it arrives in the hadoop.

1

There are 1 best solutions below

2
On BEST ANSWER

I think there is no way to this. If it were, there would be no need of defining storage handlers of tables in seperate external tables.