How to load a pyspark dataframe into cosmos having different datatypes in a a single column

83 Views Asked by At

I am trying to load a pyspark dataframe into cosmos container. one of my column (rating) has values both in string and int.

ID rating
id1 5
id2 bad

I want to load data into cosmos as per their data types. for example, In pyspark I tried casting the datatypes based on the value, similar to this. I have tried different versions of the below, like checking the values with (rlike("^[0-9]+$")) etc.

df = df.withColumn("rating", when(col("rating").cast("int").isNotNull(), col("rating").cast("int")).otherwise(col("rating")))

But once I load the data into cosmos, it all come as string, with "" around the value, for example "5" and "bad". Instead what I want is , 5 and "bad".

I am not sure if related to cosmos config when write the data, so here's my setting

"spark.cosmos.write.strategy": "ItemOverwrite",
"spark.cosmos.serialization.inclusionMode" : "NonNull",
"spark.cosmos.write.bulk.enabled": "true",
"mode" : "Append",
"Upsert" : "true"
0

There are 0 best solutions below