Why does airflow's have (beeline) become stuck?

246 Views Asked by At

The error does not appear as shown below, but the operation stops in this state.

So, if I directly access the beeline and enter the same sql statement and execute it, it works.

What's the problem?

Reading local file: /root/airflow/logs/Batch_Migration_Data/create_hive_table/2022-04-18T03:12:24.861251+00:00/1.log
[2022-04-18, 21:13:16 KST] {taskinstance.py:1037} INFO - Dependencies all met for <TaskInstance: Batch_Migration_Data.create_hive_table manual__2022-04-18T03:12:24.861251+00:00 [queued]>
[2022-04-18, 21:13:16 KST] {taskinstance.py:1037} INFO - Dependencies all met for <TaskInstance: Batch_Migration_Data.create_hive_table manual__2022-04-18T03:12:24.861251+00:00 [queued]>
[2022-04-18, 21:13:16 KST] {taskinstance.py:1243} INFO - 
--------------------------------------------------------------------------------
[2022-04-18, 21:13:16 KST] {taskinstance.py:1244} INFO - Starting attempt 1 of 1
[2022-04-18, 21:13:16 KST] {taskinstance.py:1245} INFO - 
--------------------------------------------------------------------------------
[2022-04-18, 21:13:16 KST] {taskinstance.py:1264} INFO - Executing <Task(HiveOperator): create_hive_table> on 2022-04-18 03:12:24.861251+00:00
[2022-04-18, 21:13:16 KST] {standard_task_runner.py:52} INFO - Started process 6630 to run task
[2022-04-18, 21:13:16 KST] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'Batch_Migration_Data', 'create_hive_table', 'manual__2022-04-18T03:12:24.861251+00:00', '--job-id', '21', '--raw', '--subdir', 'DAGS_FOLDER/batch_migration.py', '--cfg-path', '/tmp/tmpdxjoky8n', '--error-file', '/tmp/tmp4qqlfuol']
[2022-04-18, 21:13:16 KST] {standard_task_runner.py:77} INFO - Job 21: Subtask create_hive_table
[2022-04-18, 21:13:17 KST] {logging_mixin.py:109} INFO - Running <TaskInstance: Batch_Migration_Data.create_hive_table manual__2022-04-18T03:12:24.861251+00:00 [running]> on host airflow
[2022-04-18, 21:13:17 KST] {taskinstance.py:1431} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=Batch_Migration_Data
AIRFLOW_CTX_TASK_ID=create_hive_table
AIRFLOW_CTX_EXECUTION_DATE=2022-04-18T03:12:24.861251+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-04-18T03:12:24.861251+00:00
[2022-04-18, 21:13:17 KST] {hive.py:132} INFO - Executing: create external table ctf_20220418 (tid int, tdate varchar(255), ttime varchar(255), test1 FLOAT, test2 int, test3 int, test4 float, test5 FLOAT, test6 FLOAT, test7 FLOAT, test8 FLOAT, test9 FLOAT, D0010 FLOAT, test11 FLOAT, test12 FLOAT, tset13 FLOAT, test14 FLOAT, test15 FLOAT, test16 FLOAT, test17 FLOAT, test18 FLOAT, test19 FLOAT, defect_rate int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
[2022-04-18, 21:13:18 KST] {base.py:79} INFO - Using connection to: id: beeline_hive_container. Host: hive, Port: 10000, Schema: default, Login: root, Password: ***, extra: {'use_beeline': True, 'auth': 'NONE'}
[2022-04-18, 21:13:18 KST] {hive.py:150} INFO - Passing HiveConf: {'airflow.ctx.dag_owner': 'airflow', 'airflow.ctx.dag_id': 'Batch_Migration_Data', 'airflow.ctx.task_id': 'create_hive_table', 'airflow.ctx.execution_date': '2022-04-18T03:12:24.861251+00:00', 'airflow.ctx.dag_run_id': 'manual__2022-04-18T03:12:24.861251+00:00'}
[2022-04-18, 21:13:18 KST] {hive.py:242} INFO - beeline -u "jdbc:hive2://hive:10000/default;auth=NONE" -n root -p *** -hiveconf airflow.ctx.dag_id=Batch_Migration_Data -hiveconf airflow.ctx.task_id=create_hive_table -hiveconf airflow.ctx.execution_date=2022-04-18T03:12:24.861251+00:00 -hiveconf airflow.ctx.dag_run_id=manual__2022-04-18T03:12:24.861251+00:00 -hiveconf airflow.ctx.dag_owner=airflow -hiveconf airflow.ctx.dag_email= -hiveconf mapred.job.name=Airflow HiveOperator task for airflow.Batch_Migration_Data.create_hive_table.2022-04-18T03:12:24.861251+00:00 -f /tmp/airflow_hiveop_jhcjl4f1/tmp25idpdt6
1

There are 1 best solutions below

1
On

this is a bug in airflow's beeline(Located in miniconda2/envs/airflow/bin/beeline) which prevent beeline running in background mode.

you can find more information in:
https://issues.apache.org/jira/browse/HIVE-6758 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions

BTW: the workaround does not work me. I hacked the spark-class script at last.
export HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"