In a code repo, using pyspark, I'm trying to use today's date and based on this I need to retrieve the last day of the prior quarter. This date would be then used to filter out data in a data frame. I was trying to create a dataframe in a code repo and that wasn't working. My code works in Code Workbook. This is my code workbook code.
import datetime as dt
import pyspark.sql.functions as F
def unnamed():
date_df = spark.createDataFrame([(dt.date.today(),)], ['date'])
date_df = date_df \
.withColumn('qtr_start_date', F.date_trunc('quarter', F.col('date'))) \
.withColumn('qtr_date', F.date_sub(F.col('qtr_start_date'), 1))
return date_df
Any help would be appreciated.
I got the following code to run successfully in a Code Repository:
You'll need to pass the
ctx
argument into your transform, and you can make thepyspark.sql.DataFrame
directly using the underlyingspark_session
variable.If you already have the date column available in your input, you'll just need to make sure it's the
Date
type so that theF.date_trunc
call works on the correct type.