Is there a way to calculate moving median for an attribute in Spark DataFrame?
I was hoping that it is possible to calculate moving median using a window function (by defining a window using rowsBetween(0,10)
), but there no functionality to calculate it (similar to average
or mean
).
In Spark 2.1+, to find median we can use functions
percentile
andpercentile_approx
. We can use them both in aggregations and with window functions. As you originally wanted, you can userowsBetween()
too.Examples using PySpark: