I have my input data in ISO-8859-1 format. It is a cedilla delimited file. The data has a double quote in it. I am converting the file to UTF8 format. When doing so, spark is inserting some escape character and more quotes. What can i do to make sure that the extra quotes and escape character is not added to the output?
Sample Input
XYZÇVIB BROS CRANE AND BIG "TONYÇ1961-02-23Ç00:00:00
Sample Output
XYZÇ"VIB BROS CRANE AND BIG \"TONY"Ç1961-02-23Ç00:00:00
Code
var InputFormatDataFrame = sparkSession.sqlContext.read
.format("com.databricks.spark.csv")
.option("delimiter", delimiter)
.option("charset", input_format)
.option("header", "false")
.option("treatEmptyValuesAsNulls","true")
.option("nullValue"," ")
.option("quote","")
.option("quoteMode","NONE")
//.option("escape","\"")
.option("ignoreLeadingWhiteSpace", "true")
.option("ignoreTrailingWhiteSpace", "true")
.option("mode","FAILFAST")
.load(input_location)
InputFormatDataFrame.write.mode("overwrite").option("delimiter", delimiter).option("charset", "UTF-8").csv(output_location)