The Spark Scala API has a Dataset#transform
method that makes it easy to chain custom DataFrame transformations like so:
val weirdDf = df
.transform(myFirstCustomTransformation)
.transform(anotherCustomTransformation)
I don't see an equivalent transform
method for pyspark in the documentation.
Is there a PySpark way to chain custom transformations?
If not, how can the pyspark.sql.DataFrame
class be monkey patched to add a transform
method?
Update
The transform method was added to PySpark as of PySpark 3.0.
Implementation:
Usage: