cast method results in null values in java spark

1.1k Views Asked by At

I have a simple use case of performing join on two dataframes, I am using spark 1.6.3 version. The issue is that while trying to cast the string type to integer type using cast method the resulting column is all null values.

I have already tried all solutions mentioned here How to cast a column in dataframe? but all questions have answers for scala api and I could not find anyone that works with java api.

DataFrame dataFromDB = getDataFromDB("(select * from schema.table where 
col1 is not null)"); //This method uses spark sql 
                    //to connect to a db2 data base and get the data

//I perform the cast operation as
dataFromDB.withColumn("INCOME_DATA", dataFromDB.col("INCOME_DATA")
                                    .cast(DataTypes.IntegerType));
//but the above results in null values
//other things I tried based on the link above is below
dataFromDB.selectExpr(cast("INCOME_DATA" as integer")) //this too produces null values

//I tried to remove the whitespaces from income data column with no success
dataFromDB.select(dataFromDB.col("INCOME_DATA").toString().replaceAll("\\s+", ""); //this does not remove any whitespace

I am unable to find the solution for it, also the column I am trying to convert is of String type and may contain trailing whitespaces, could this be an issue? if yes then how can i remove them, I tried to remove them as below but doesnot seem to work. This is my first ever working with spark dataframe so any help with this is deeply appreciated. Thanks!

1

There are 1 best solutions below

1
mayurc On

Can you try something like this for your last line?

import org.apache.spark.sql.functions._
dataFromDB.withColumn("INCOME_DATA", regexp_replace($"INCOME_DATA", "\\s+", "")).select("INCOME_DATA")

In Java:

dataFromDB.withColumn("INCOME_DATA", functions.regexp_replace(functions.col("INCOME_DATA"), "\\s+", "")).select("INCOME_DATA");