I have a column names ViolationTime in my Hive table. It contains time in 24-hour HHmm format, for example 1424.
The table contains 10 million rows. I want to divide it into 6 discrete groups to perform operations.
I tried using ntile, but it will divide the values based on ascending or descending order. I'd like this column to be divided in discrete intervals.
In Hive 3.0 and newer, the
width_bucket()function does that:Although you may find that you need to convert your
HHmmtime values to INTs first (e.g. number of seconds since midnight), to make it work perfectly well.