How to rename columns with dots?

1.5k Views Asked by At

I use Spark 1.5.

I'm struggling with columns which contain dots in their name (e.g. param.x.y) . I first had the issue of selecting them, but then I read that I need to use ` character (`param.x.y`).

Now I'm having issue when trying to rename the columns. I'm using similar approach, but it seems that it doesn't work:

df.withColumnRenamed("`param.x.y`", "param_x_y")

So I wanted to check - is this really a bug, or am I doing something wrong?

1

There are 1 best solutions below

3
Sandeep Singh On

Looks like in your code, the problem is with `` in original column name. I just removed it and it worked for me. Sample working code to rename Column name within the dataframe.

import org.apache.spark._
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql._
import org.apache.spark._
import org.apache.spark.sql.DataFrame
import org.apache.spark.rdd.RDD

// Import Row.
import org.apache.spark.sql.Row;
// Import Spark SQL data types
import org.apache.spark.sql.types.{ StructType, StructField, StringType };

object RenameColumn extends Serializable {

  val conf = new SparkConf().setAppName("read local file")

  conf.set("spark.executor.memory", "100M")
  conf.setMaster("local");

  val sc = new SparkContext(conf)
  // sc is an existing SparkContext.
  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
  def main(args: Array[String]): Unit = {

    // Create an RDD
    val people = sc.textFile("C:/Users/User1/Documents/test");
    // The schema is encoded in a string
    val schemaString = "name age"

    // Generate the schema based on the string of schema
    val schema =
      StructType(
        schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

    // Convert records of the RDD (people) to Rows.
    val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
    // Apply the schema to the RDD.
    val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
    peopleDataFrame.printSchema()

    val renamedSchema = peopleDataFrame.withColumnRenamed("name", "name_renamed");
    renamedSchema.printSchema();
    sc.stop

  }
}

Its output:

16/12/26 16:53:48 INFO SparkContext: Created broadcast 0 from textFile at RenameColumn.scala:28
root
 root
 |-- name.rename: string (nullable = true)
 |-- age: string (nullable = true)

root
 |-- name_renamed: string (nullable = true)
 |-- age: string (nullable = true)

16/12/26 16:53:49 INFO SparkUI: Stopped Spark web UI at http://XXX.XXX.XXX.XXX:<port_number>
16/12/26 16:53:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

For more information you can check spark dataframe documentation

Update: I just tested with the quoted string and got the expected output. Please see the code and its output below.

val schemaString = "`name.rename` age"

    // Generate the schema based on the string of schema
    val schema =
      StructType(
        schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

    // Convert records of the RDD (people) to Rows.
    val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim))
    // Apply the schema to the RDD.
    val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema)
    peopleDataFrame.printSchema()

    val renamedSchema = peopleDataFrame.withColumnRenamed("`name.rename`", "name_renamed");
    renamedSchema.printSchema();
    sc.stop

Its output:

16/12/26 20:24:24 INFO SparkContext: Created broadcast 0 from textFile at RenameColumn.scala:28
root
 |-- `name.rename`: string (nullable = true)
 |-- age: string (nullable = true)

root
 |-- name_renamed: string (nullable = true)
 |-- age: string (nullable = true)

16/12/26 20:24:25 INFO SparkUI: Stopped Spark web UI at http://xxx.xxx.xxx.x:<port_number>
16/12/26 20:24:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!