Error when storing values of dataframe into hive in orc or parquet format in a Dev CDP environment (SPARK,SCALA)

Question

Error when storing values of dataframe into hive in orc or parquet format in a Dev CDP environment (SPARK,SCALA)

153 Views Asked by Ahmad Khan Niazi At 29 July 2025 at 06:42

All of the use cases we have tested in our previous HDP environment work, so we wanted to shift it to CDP whenever i try to write a csv dataframe in hive it gives me this error. I have tried everything all the libraries the csv is being stored in a dataframe from HDFS. I printed the schema of the DF and its correct.

21/07/14 12:13:56 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 (TID 39, datanode2.baf.com, executor 1): com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException - -1
Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
Parser Configuration: CsvParserSettings:
    Auto configuration enabled=true
    Auto-closing enabled=true
    Autodetect column delimiter=false
    Autodetect quotes=false
    Column reordering enabled=true
    Delimiters for detection=null
    Empty value=
    Escape unquoted values=false
    Header extraction enabled=null
    Headers=null
    Ignore leading whitespaces=false
    Ignore leading whitespaces in quotes=false
    Ignore trailing whitespaces=true
    Ignore trailing whitespaces in quotes=false
    Input buffer size=128
    Input reading on separate thread=false
    Keep escape sequences=false
    Keep quotes=false
    Length of content displayed on error=-1
    Line separator detection enabled=false
    Maximum number of characters per column=5000000
    Maximum number of columns=20480
    Normalize escaped line separators=true
    Null value=
    Number of records to read=all
    Processor=none
    Restricting data in exceptions=false
    RowProcessor error handler=null
    Selected fields=none
    Skip bits as whitespace=true
    Skip empty lines=true
    Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
    CsvFormat:
        Comment character=\0
        Field delimiter=~
        Line separator (normalized)=\n
        Line separator sequence=\n
        Quote character="
        Quote escape character=\
        Quote escape escape character=null
Internal state when error was thrown: line=54, column=61, record=54, charIndex=133505, headers=[LEAD_CO_MNE, BRANCH_CO_MNE, MIS_DATE, @ID, CONTRACT_DATE, VALUE_DATE, START_DATE, DRAWDOWN_END_DATE, PAYMENT_START_DATE, MATURITY_DATE, ARR_AGE_STATUS, RENEWAL_DATE, COOLING_DATE, CANCEL_DATE, BASE_DATE, BILL_PAY_DATE, BILL_ID, ACTIVITY_REF, BILL_DATE, BILL_TYPE, PAY_METHOD, BILL_STATUS, SET_STATUS, AGING_STATUS, NXT_AGE_DATE, CHASER_DATE, ALL_AGE_STATUS, SUSPENDED, REPORT_END_DATE, PAYMENT_TYPE, NUM_PAYMENTS, PROPERTY, PAYMENT_DATE, ACT_PAY_DATE, FIN_PAY_DATE, REPAY_REFERENCE, RPY_BILL_ID, SUSP_STATUS, SUSP_DATE, LAST_RENEW_DATE, PAYMENT_END_DATE, BILLS_SETTLED_CNT, STATIC_UPDATE, RESERVED_5, RPY_REFERENCE, RESERVED_4, RESERVED_3, RPY_ACTUAL_DATE, ACTUAL_RENEW_DATE, RESERVED_6, RESERVED_7, RESERVED_8, RESERVED_9, RESERVED_10, RESERVED_11, RESERVED_12, RESERVED_13, RESERVED_14, RESERVED_15, RESERVED_16, RESERVED_17, RESERVED_18]
    at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:395)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:616)
    at org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.next(UnivocityParser.scala:331)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
    at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:464)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:645)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:227)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:116)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at com.univocity.parsers.common.input.AbstractCharInputReader.getString(AbstractCharInputReader.java:482)
    at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:185)
    at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:108)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:574)
    ... 24 more

Original Q&A

There are 1 best solutions below

**Rushabh Gujarathi** · Answer 1

Rushabh Gujarathi On 18 July 2021 at 14:00

Can you please share

1.What is the version of your CDP. 2.Your sample data. 3.Your hive table ddl.

I guess you might require a HIVE connector jar.

Error when storing values of dataframe into hive in orc or parquet format in a Dev CDP environment (SPARK,SCALA)

There are 1 best solutions below

Related Questions in SCALA

Related Questions in CSV

Related Questions in HIVE

Related Questions in APACHE-SPARK-SQL

Related Questions in CDP

Trending Questions

Popular # Hahtags

Popular Questions