spark csv datasoruce unable to write leading OR trailing contrl charector

186 Views Asked by Amiya Mishra At 02 July 2025 at 13:42

val value:String = "\u0001"+ "V1" + "\u0002"
val df  = Seq((value)).toDF("f1")
df.show

Now df is having proper value for field f1. But while writing using spark in build csv format with below code, the ^A, ^B characters are not showing in output.

df.write.format("csv").option("delimiter", "\t").option("codec", "bzip2").save("temp_out")

Here the temp_out output doesnot show any ^A, ^B chraracter for field f1

Looking forward some suggestions.

Original Q&A

There are 1 best solutions below

ELinda On 07 September 2020 at 00:49

If Spark's save operation is dropping certain characters, you'll notice that when you open the CSV file(s), those bytes are missing. First, take a look at the bytes in value:

value.getBytes()    # Array[Byte] = Array(1, 86, 49, 2)

saveAsTextFile has been around for a while, and is a bit more straightforward. If you can't get the CSV option to work, this is a good workaround.

df.rdd.map(_.mkString("\t")).saveAsTextFile("temp_out")

You'll probably still be able to read the file using the csv method from the reader, without any dropped characters, as below (but you'll want to confirm with your specific setup):

spark.read.option("delimiter", "\t").csv("temp_out/").take(1)(0).getString(0).getBytes()
# result is Array[Byte] = Array(1, 86, 49, 2)

spark csv datasoruce unable to write leading OR trailing contrl charector

There are 1 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-CSV

Trending Questions

Popular # Hahtags

Popular Questions