How to convert short date D-MMM-yyyy using PySpark

88 Views Asked by At

Why just the Jan works when try to convert using the code below?

    df2 = spark.createDataFrame([["05-Nov-2000"], ["02-Jan-2021"]], ["date"])
    df2 = df2.withColumn("date", to_date(col("date"), "D-MMM-yyyy"))
    
    display(df2)

Result:

Date
------------
undefined
2021-01-02

2

There are 2 best solutions below

0
vladsiv On

D is a day of year.

The first one works because 02 is in fact in January, but 05 is not in November.

If you try:

data = [{"date": "05-Jan-2000"}, {"date": "02-Jan-2021"}]

It will work for both.

However, you need d which is the day of the month. So use d-MMM-yyyy.

For further information please see: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

0
Domi On

D is day-of-the-year. What you're looking for is d - day of the month.

PySpark supports the Java DateTimeFormatter patterns: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/time/format/DateTimeFormatter.html

df2 = spark.createDataFrame([["05-Nov-2000"], ["02-Jan-2021"]], ["date"])
df2 = df2.withColumn("date", to_date(col("date"), "dd-MMM-yyyy"))
df2.show()


+----------+
|      date|
+----------+
|2000-11-05|
|2021-01-02|
+----------+