I tried to connect to a Informix database in spark using the following syntax.

jdbcDF = sqlContext.read.format("jdbc").option("url", "jdbc:informix-sqli://192.168.x.xx:xxxx/INFORMIXSERVER=online").option("dbtable", "informix.detail").option("user", "user").option("password", "xxxxxx").option('driver','com.informix.jdbc.IfxDriver').load()

Conncection was successful and I could see the schema of the dataframe.

jdbcDF.printSchema() 

   root
 |-- mobile_no: string (nullable = false)
 |-- subscriber_code: string (nullable = false)
 |-- connected_date: date (nullable = true)
 |-- disconnected_on: date (nullable = true)
 |-- att0: string (nullable = true)

But when retrieve the data from the dataframe,

jdbcDF.show()

I get the following error.

Not enough tokens are specified in the string representation of a date value. "disconnected_on"

I found a same problem in the internet, IBM Knowledge Center and it says I need to change the database column in the Informix database but, in my case that is not possible.

Is their any way I can cast the 'disconnected_on' field in to string in the dataframe before load from informix table?

1

There are 1 best solutions below

0
On BEST ANSWER

In order to cast a column, you can use cast()

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast

>>> df.select(df.age.cast("string").alias('ages')).collect()
[Row(ages=u'2'), Row(ages=u'5')]

You can drop the old column using drop()

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html

drop(*cols)

Returns a new DataFrame that drops the specified column. This is a no-op if schema doesn’t contain the given column name(s).
Parameters: cols – a string name of the column to drop, or a Column to drop, or a list of string name of the columns to drop.

>>> df.drop('age').collect()
[Row(name=u'Alice'), Row(name=u'Bob')]

>>> df.drop(df.age).collect()
[Row(name=u'Alice'), Row(name=u'Bob')]

Combining those two functions, you can add a new column disconnected_on_str, which is disconnected_on cast to be string, and drop the old column disconnected_on :

jdbcDF_cast = jdbcDF.withColumn("disconnected_on_str", jdbcDF["disconnected_on"].cast("string")).drop("disconnected_on")