Hive / Tez job won't start

2.7k Views Asked by At

I am trying to create an ORC table in Hive by importing from a text file in HDFS. I have tried multiple different ways, searched online for help, and regardless the insert job won't start.

I can get the text file to HDFS, I can read the text file to Hive, but I cannot convert from that to ORC.

I tried many different variations, including this one that can be used as a reference to this question:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/moving_data_from_hdfs_to_hive_external_table_method.html

I have a single-node HDP cluster (being used for development) - version:

HDP-2.3.2.0

(2.3.2.0-2950)

And here are the relevant service versions:

Service Version Status Description

HDFS 2.7.1.2.3 Installed Apache Hadoop Distributed File System

MapReduce2 2.7.1.2.3 Installed Apache Hadoop NextGen MapReduce (YARN)

YARN 2.7.1.2.3 Installed Apache Hadoop NextGen MapReduce (YARN)

Tez 0.7.0.2.3 Installed Tez is the next generation Hadoop Query Processing framework written on top of YARN.

Hive 1.2.1.2.3 Installed Data warehouse system for ad-hoc queries & analysis of large datasets and table & storage management service

What happens when I run a SQL like this (again, I've tried many variations including directly from online tutorials):

INSERT OVERWRITE TABLE mycars SELECT * FROM cars;

My job stays like this:

Total number of applications (application-types: [] and states:

[SUBMITTED, ACCEPTED, RUNNING]):1

Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL

application_1455989658079_0002  HIVE-3f41161c-b806-4e7d-974e-c18e028d683f                    TEZ          hive   root.hive            ACCEPTED           UNDEFINED               0%                                 N/A

And it just hangs there. (Literally, I've tried a 20 row sample table and let it run for hours before killing it).

I am by no means an Hadoop expert (yet) and am sure it's probably a config issue, but I have been unable to figure it out.

All other Hive operations I've tried, such as creating dropping tables, loading a file to a text table, selects, all work fine. It's just when I create an ORC table that it does this. And I need an ORC table for my requirement.

Any advice would be helpful.

1

There are 1 best solutions below

0
On

Most of the time it has to do with increasing your Yarn Scheduling capacity, but if your resources are already capped you can also reduce the amount of memory requested by individual TEZ tasks, through adjusting the following property in TEZ configuration :

task.resource.memory.mb

In order to increase the Cluster's capacity you can do it in the configuration settings of YARN or directly through Ambari or Cloudera Manager

enter image description here

In order to monitor what is happening behind the hoods you can run Yarn Resource Manager UI and check the diagnostics tab of the specific Application there are useful explicit messages about resource allocation especially when the job is accepted and keeps pending.

enter image description here