I am trying to load a pyspark dataframe into cosmos container. one of my column (rating) has values both in string and int.
ID | rating |
---|---|
id1 | 5 |
id2 | bad |
I want to load data into cosmos as per their data types. for example, In pyspark I tried casting the datatypes based on the value, similar to this. I have tried different versions of the below, like checking the values with (rlike("^[0-9]+$")) etc.
df = df.withColumn("rating", when(col("rating").cast("int").isNotNull(), col("rating").cast("int")).otherwise(col("rating")))
But once I load the data into cosmos, it all come as string, with "" around the value, for example "5" and "bad". Instead what I want is , 5 and "bad".
I am not sure if related to cosmos config when write the data, so here's my setting
"spark.cosmos.write.strategy": "ItemOverwrite",
"spark.cosmos.serialization.inclusionMode" : "NonNull",
"spark.cosmos.write.bulk.enabled": "true",
"mode" : "Append",
"Upsert" : "true"