Spark-Scala quote issue

602 Views Asked by At

I have my input data in ISO-8859-1 format. It is a cedilla delimited file. The data has a double quote in it. I am converting the file to UTF8 format. When doing so, spark is inserting some escape character and more quotes. What can i do to make sure that the extra quotes and escape character is not added to the output?

Sample Input

XYZÇVIB BROS CRANE AND BIG "TONYÇ1961-02-23Ç00:00:00

Sample Output

XYZÇ"VIB BROS CRANE AND BIG \"TONY"Ç1961-02-23Ç00:00:00

Code

var InputFormatDataFrame = sparkSession.sqlContext.read
                .format("com.databricks.spark.csv")
                .option("delimiter", delimiter)
                .option("charset", input_format)
                .option("header", "false")
                .option("treatEmptyValuesAsNulls","true")
                .option("nullValue"," ")
                .option("quote","")
                .option("quoteMode","NONE")
                //.option("escape","\"")
                .option("ignoreLeadingWhiteSpace", "true")
                .option("ignoreTrailingWhiteSpace", "true")
                .option("mode","FAILFAST")
                .load(input_location)
                InputFormatDataFrame.write.mode("overwrite").option("delimiter", delimiter).option("charset", "UTF-8").csv(output_location)
0

There are 0 best solutions below