Spark CSV : Parse files deliminated by Ascii æ (Hex E6)

511 Views Asked by At

I have large data files deliminated by ASCII character æ (Hex E6). My code snipped for parsing the file is as follows ,but seems the parser does not slit values properly (I use Spark 2.4.1)

implicit class DataFrameReadImplicits (dataFrameReader: DataFrameReader) {
     def readTeradataCSV(schema: StructType, path: String) : DataFrame = {
        dataFrameReader.option("delimiter", "\u00E6")
          .option("header", "false")
          .option("inferSchema", "false")
          .option("multiLine","true")
          .option("encoding", "UTF-8")
          .schema(schema)
          .csv(path)
     }
  }

Sample file : https://gist.github.com/ashikaumanga/c2161eee07da9b10052a4e53bc4c567e

Any tips how to fix this?

enter image description here

0

There are 0 best solutions below