read percentage values in spark

572 Views Asked by At

I have a xlsx file which has a single column ;

percentage
30%
40%
50%
-10%
0.00%
0%
0.10%
110%
99.99%
99.98%
-99.99%
-99.98%

when i read this using Apache-Spark out put i get is,

|percentage|
+----------+
|       0.3|
|       0.4|
|       0.5|
|      -0.1|
|       0.0|
|       0.0|
|     0.001|
|       1.1|
|    0.9999|
|    0.9998|
+----------+

expected output is ,

+----------+
|percentage|
+----------+
|       30%|
|       40%|
|       50%|
|      -10%|
|     0.00%|
|        0%|
|     0.10%|
|      110%|
|    99.99%|
|    99.98%|
+----------+

My code -

val spark = SparkSession
    .builder
    .appName("trimTest")
    .master("local[*]")
    .getOrCreate()

  val df = spark.read
      .format("com.crealytics.spark.excel").
      option("header", "true").
      option("maxRowsInMemory", 1000).
      option("inferSchema", "true").
  load("data/percentage.xlsx")

  df.printSchema()
  df.show(10)

I Don't want to use casting or turning inferschema to false, i want a way to read percentage value as percentage not as double or string.

1

There are 1 best solutions below

4
On

Well, percentage ARE double: 30% = 0.3

The only difference is the way it is displayed and, as @Artem_Aliev wrote in comment, there is no percentage type in spark that would print out as you expect. But once again: percentage are double, same thing, different notation.

The question is, what do you want to do with those percentage?

  • to "apply" them on something else, i.e. use multiply, then just use the double type column
  • to have a nice print, convert to the suitable string before printing:
val percentString = format_string("%.2f%%", $"percentage" * 100)
ds.withColumn("percentage", percentString).show()