I tried to connect to a Informix database in spark using the following syntax.
jdbcDF = sqlContext.read.format("jdbc").option("url", "jdbc:informix-sqli://192.168.x.xx:xxxx/INFORMIXSERVER=online").option("dbtable", "informix.detail").option("user", "user").option("password", "xxxxxx").option('driver','com.informix.jdbc.IfxDriver').load()
Conncection was successful and I could see the schema of the dataframe.
jdbcDF.printSchema()
root
|-- mobile_no: string (nullable = false)
|-- subscriber_code: string (nullable = false)
|-- connected_date: date (nullable = true)
|-- disconnected_on: date (nullable = true)
|-- att0: string (nullable = true)
But when retrieve the data from the dataframe,
jdbcDF.show()
I get the following error.
Not enough tokens are specified in the string representation of a date value. "disconnected_on"
I found a same problem in the internet, IBM Knowledge Center and it says I need to change the database column in the Informix database but, in my case that is not possible.
Is their any way I can cast the 'disconnected_on' field in to string in the dataframe before load from informix table?
In order to cast a column, you can use
cast()
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast
You can drop the old column using
drop()
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html
drop(*cols)
Combining those two functions, you can add a new column
disconnected_on_str
, which isdisconnected_on
cast
to bestring
, anddrop
the old columndisconnected_on
: