Apache spark: asc not working as expected

34 Views Asked by At

I have following code:

df.orderBy(expr("COUNTRY_NAME").desc, expr("count").asc).show()

I expect count column to be arranged in ascending order for a given COUNTRY_NAME. But I see something like this:

enter image description here

Last value of 12 is not as per the expectation.

Why is it so?

1

There are 1 best solutions below

0
On BEST ANSWER

If you output df.printSchema(), you'll see that your "count" column is of the string datatype, resulting in the undesired alphanumeric sort.

In pyspark, you can use the following to accomplish what you are looking for:

df = df.withColumn('count',df['count'].cast('int'))
df.orderBy(['COUNTRY_NAME'],ascending=False).orderBy(['count'],ascending=True).show()

You should create and apply your schema when the data is read in - if possible.