Invalid Data in Hive-Created Table?

222 Views Asked by At

I'm using Hive version 3.1.3 on Hadoop 3.3.4 with Tez 0.9.2. I'm trying to run a SELECT statement on table that Hive created and manages. The query never finishes and fails. The full error message is below, but this appears to be the relevant portion:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector

It looks like the error is a long to decimal conversion issue. However, this table was created by Hive, loading/transforming data in a previous step. Wouldn't Hive have thrown an error earlier if it was inserting an invalid value into a decimal column?

I used the exact same codebase and the exact same data on AWS EMR and didn't get this error, so I don't think there's an invalid value. But I'm stuck on where to go from here.

Here's the table definition:

claimid             varchar(50)
claimlineid         int
dos                 date
dosto               date
member              varchar(50)
provider            varchar(50)
setname             varchar(255)
code                varchar(50)
system              varchar(255)
primary             int
positivenegative    int
result              decimal(10,2)
supply              int
size                decimal(10,2)
quantity            decimal(10,2)

And here's the full error message:

Vertex failed, vertexName=Map 1, vertexId=vertex_1667735849290_0030_32_15, diagnostics=[Task failed, taskId=task_1667735849290_0030_32_15_000009, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1667735849290_0030_32_15_000009_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:488)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
        ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)].  Read field #14 at field start position 0 current read offset 104
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:611)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.closeOp(VectorMapJoinGenerateResultOperator.java:681)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:757)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:477)
        ... 17 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)].  Read field #14 at field start position 0 current read offset 104
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:609)
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:671)
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:604)
        ... 21 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 4096 at start offset 4 for length 100 to read 14 fields with types [varchar(50), int, date, date, varchar(50), varchar(50), varchar(255), varchar(50), varchar(255), int, decimal(10,2), int, decimal(10,2), decimal(10,2)].  Read field #14 at field start position 0 current read offset 104
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:589)
        ... 23 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:687)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:934)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.reProcessBigTable(VectorMapJoinGenerateResultOperator.java:585)
        ... 23 more
1

There are 1 best solutions below

0
Stanislav G. On

Unfortunately, this is a problem with CBO. You can disable it, run the expression and get the result. set hive.cbo.enable=false;