I want to make bins on a pyspark dataframe column with range of values and specific values
In this post How to bin in PySpark? It explains how yo make the bins, but it is only for range of values, from 0 to 6, from 6 to 18, and so on, I would like to have just a bin for specific values, for example, I want bins from 0 to 5, 5 to 9, 10, 11 to 20, 21 and more
+-----------+
| Age | Bin |
+-----------+
| 4 | 1 |
| 10 | 3 |
| 6 | 2 |
| 40 | 5 |
| 2 | 1 |
| 8 | 2 |
+-----------+
I tried
splits = [ 0, 5, 9, 10, 10, 11, float('Inf') ]
But It gives an error
I used a tolerance to make the bin just for 10, but I am wondering if there is a better way
tol= 1.0e-10
splits = [ 0, 5, 9, 10, 10+tol, 11, float('Inf') ]
You can manually specify a
CASE WHEN
column for binning: