Spark CSV : Parse files deliminated by Ascii æ (Hex E6)

511 Views Asked by Ashika Umanga Umagiliya At 18 May 2020 at 03:02

I have large data files deliminated by ASCII character æ (Hex E6). My code snipped for parsing the file is as follows ,but seems the parser does not slit values properly (I use Spark 2.4.1)

implicit class DataFrameReadImplicits (dataFrameReader: DataFrameReader) {
     def readTeradataCSV(schema: StructType, path: String) : DataFrame = {
        dataFrameReader.option("delimiter", "\u00E6")
          .option("header", "false")
          .option("inferSchema", "false")
          .option("multiLine","true")
          .option("encoding", "UTF-8")
          .schema(schema)
          .csv(path)
     }
  }

Sample file : https://gist.github.com/ashikaumanga/c2161eee07da9b10052a4e53bc4c567e

Any tips how to fix this?

Original Q&A

Spark CSV : Parse files deliminated by Ascii æ (Hex E6)

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-CSV

Trending Questions

Popular # Hahtags

Popular Questions