Error when storing values of dataframe into hive in orc or parquet format in a Dev CDP environment (SPARK,SCALA)

159 Views Asked by At

All of the use cases we have tested in our previous HDP environment work, so we wanted to shift it to CDP whenever i try to write a csv dataframe in hive it gives me this error. I have tried everything all the libraries the csv is being stored in a dataframe from HDFS. I printed the schema of the DF and its correct.

21/07/14 12:13:56 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 (TID 39, datanode2.baf.com, executor 1): com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException - -1
Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
Parser Configuration: CsvParserSettings:
    Auto configuration enabled=true
    Auto-closing enabled=true
    Autodetect column delimiter=false
    Autodetect quotes=false
    Column reordering enabled=true
    Delimiters for detection=null
    Empty value=
    Escape unquoted values=false
    Header extraction enabled=null
    Headers=null
    Ignore leading whitespaces=false
    Ignore leading whitespaces in quotes=false
    Ignore trailing whitespaces=true
    Ignore trailing whitespaces in quotes=false
    Input buffer size=128
    Input reading on separate thread=false
    Keep escape sequences=false
    Keep quotes=false
    Length of content displayed on error=-1
    Line separator detection enabled=false
    Maximum number of characters per column=5000000
    Maximum number of columns=20480
    Normalize escaped line separators=true
    Null value=
    Number of records to read=all
    Processor=none
    Restricting data in exceptions=false
    RowProcessor error handler=null
    Selected fields=none
    Skip bits as whitespace=true
    Skip empty lines=true
    Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
    CsvFormat:
        Comment character=\0
        Field delimiter=~
        Line separator (normalized)=\n
        Line separator sequence=\n
        Quote character="
        Quote escape character=\
        Quote escape escape character=null
Internal state when error was thrown: line=54, column=61, record=54, charIndex=133505, headers=[LEAD_CO_MNE, BRANCH_CO_MNE, MIS_DATE, @ID, CONTRACT_DATE, VALUE_DATE, START_DATE, DRAWDOWN_END_DATE, PAYMENT_START_DATE, MATURITY_DATE, ARR_AGE_STATUS, RENEWAL_DATE, COOLING_DATE, CANCEL_DATE, BASE_DATE, BILL_PAY_DATE, BILL_ID, ACTIVITY_REF, BILL_DATE, BILL_TYPE, PAY_METHOD, BILL_STATUS, SET_STATUS, AGING_STATUS, NXT_AGE_DATE, CHASER_DATE, ALL_AGE_STATUS, SUSPENDED, REPORT_END_DATE, PAYMENT_TYPE, NUM_PAYMENTS, PROPERTY, PAYMENT_DATE, ACT_PAY_DATE, FIN_PAY_DATE, REPAY_REFERENCE, RPY_BILL_ID, SUSP_STATUS, SUSP_DATE, LAST_RENEW_DATE, PAYMENT_END_DATE, BILLS_SETTLED_CNT, STATIC_UPDATE, RESERVED_5, RPY_REFERENCE, RESERVED_4, RESERVED_3, RPY_ACTUAL_DATE, ACTUAL_RENEW_DATE, RESERVED_6, RESERVED_7, RESERVED_8, RESERVED_9, RESERVED_10, RESERVED_11, RESERVED_12, RESERVED_13, RESERVED_14, RESERVED_15, RESERVED_16, RESERVED_17, RESERVED_18]
    at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:395)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:616)
    at org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.next(UnivocityParser.scala:331)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
    at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:464)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:645)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:227)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:116)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at com.univocity.parsers.common.input.AbstractCharInputReader.getString(AbstractCharInputReader.java:482)
    at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:185)
    at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:108)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:574)
    ... 24 more
1

There are 1 best solutions below

2
On

Can you please share

1.What is the version of your CDP. 2.Your sample data. 3.Your hive table ddl.

I guess you might require a HIVE connector jar.