I have a simple use case of performing join on two dataframes, I am using spark 1.6.3 version. The issue is that while trying to cast the string type to integer type using cast method the resulting column is all null values.
I have already tried all solutions mentioned here How to cast a column in dataframe? but all questions have answers for scala api and I could not find anyone that works with java api.
DataFrame dataFromDB = getDataFromDB("(select * from schema.table where
col1 is not null)"); //This method uses spark sql
//to connect to a db2 data base and get the data
//I perform the cast operation as
dataFromDB.withColumn("INCOME_DATA", dataFromDB.col("INCOME_DATA")
.cast(DataTypes.IntegerType));
//but the above results in null values
//other things I tried based on the link above is below
dataFromDB.selectExpr(cast("INCOME_DATA" as integer")) //this too produces null values
//I tried to remove the whitespaces from income data column with no success
dataFromDB.select(dataFromDB.col("INCOME_DATA").toString().replaceAll("\\s+", ""); //this does not remove any whitespace
I am unable to find the solution for it, also the column I am trying to convert is of String type and may contain trailing whitespaces, could this be an issue? if yes then how can i remove them, I tried to remove them as below but doesnot seem to work. This is my first ever working with spark dataframe so any help with this is deeply appreciated. Thanks!
Can you try something like this for your last line?
In Java: