I just want to show you the way "How to convert pyspark df to python string?". Task: get a python string from a pyspark dataframe.
If you know how to make this easier, let me know!)
- I got df from spark.sql()
sql = f"""
SELECT max(calculation_dt) max_calc FROM default.table
"""
max_calc_dt = spark.sql(sql)
It returned only one row: +----------+ | max_calc| +----------+ |2023-07-31| +----------+
- Change type of col max_calc from date to str:
a = max_calc_dt.select(col('max_calc').cast('string')) #change type of col max_calc from date to str
- Convert df to rdd:
a_rdd = a.rdd.collect()
- Get python str: There are two ways: use cycle "for" or get values using index. I'll show you the second option:
res = a_rdd[0]['max_calc'].strip()
print(res)