Hive TEZ is taking very long time to run the query

Question

Hive TEZ is taking very long time to run the query

2.1k Views Asked by Varshini At 18 November 2018 at 19:51

I'm kinda of new to Hive and Hadoop . I have a query which is taking 10 minutes to complete the query .

Size of the data is 10GB Statistics:Num rows: 4457541 Data size: 1854337449 Basic stats: COMPLETE Column stats: COMPLETE

Partition and Bucketing is done in the table .

How can I improve the below query .

select * fromtbl1 where clmn='Abdul' and loc='IND' and TO_UNIX_TIMESTAMP(ts) > (UNIX_TIMESTAMP() - 5*60*60);
set hive.vectorized.execution.reduce.enabled=true;
set hive.tez.container.size=8192;
set hive.fetch.task.conversion = none;
set mapred.compress.map.output=true;
set mapred.output.compress=true;
set hive.fetch.task.conversion=none;


-----------+--+
|                                                                                                           Explain                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| Plan not optimized by CBO.                                                                                                                                                                                                  |
|                                                                                                                                                                                                                             |
| Stage-0                                                                                                                                                                                                                     |
|    Fetch Operator                                                                                                                                                                                                           |
|       limit:-1                                                                                                                                                                                                              |
|       Stage-1                                                                                                                                                                                                               |
|          Map 1                                                                                                                                                                                                              |
|          File Output Operator [FS_2973]                                                                                                                                                                                     |
|             compressed:false                                                                                                                                                                                                |
|             Statistics:Num rows: 49528 Data size: 24516360 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                     |
|             table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}  |
|             Select Operator [SEL_2972]                                                                                                                                                                                      |
|                outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"]                                                                                                                          |
|                Statistics:Num rows: 49528 Data size: 24516360 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                  |
|                Filter Operator [FIL_2971]                                                                                                                                                                                   |
|                   predicate:((section = 'xysaa') and (to_unix_timestamp(ts) > (unix_timestamp() - 18000))) (type: boolean)                                                                                               |
|                   Statistics:Num rows: 49528 Data size: 24516360 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                               |
|                   TableScan [TS_2970]                                                                                                                                                                                       |
|                      ACID table:true                                                                                                                                                                                        |
|                      alias:pp                                                                                                                                                                              |
|                      Statistics:Num rows: 4457541 Data size: 1854337449 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                        |
|                                                                                                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+

None of the parameters helped us to resolve the query in shorter period of time .

Original Q&A

There are 1 best solutions below

**leftjoin** · Answer 1 · 2018-11-19T19:12:26.103000

According to the plan, query runs on mapper, vectorizing is not enabled. Try this:

set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled=true;

Tune mapper parallelism:

set tez.grouping.max-size=67108864;
set tez.grouping.min-size=32000000;

Play with these settings to increase the number of mappers running. Ideally it should run without this setting:

set hive.tez.container.size=8192;

One more recommendation is to replace unix_timestamp() with UNIX_TIMESTAMP(current_timestamp). This function is not deterministic and its value is not fixed for the scope of a query execution, therefore prevents proper optimization of queries - this has been deprecated since 2.0 in favor of CURRENT_TIMESTAMP constant.

(UNIX_TIMESTAMP(current_timestamp) - 5*60*60)

Also your files are very small. the size of partition is 200-500, 12 files per partition, 20-50Mb is the file size. Fortunately it is ORC and you can concatenate files using ALTER TABLE CONCATENATE COMMAND. 12 files is not a big deal and you probably will not notice an improvement when querying single partition.

See also this answer: https://stackoverflow.com/a/48487306/2700344

Hive TEZ is taking very long time to run the query

There are 1 best solutions below

Related Questions in HIVE

Related Questions in MAPREDUCE

Related Questions in APACHE-TEZ

Trending Questions

Popular # Hahtags

Popular Questions