I have a column names ViolationTime
in my Hive table. It contains time in 24-hour HHmm
format, for example 1424.
The table contains 10 million rows. I want to divide it into 6 discrete groups to perform operations.
I tried using ntile
, but it will divide the values based on ascending or descending order. I'd like this column to be divided in discrete intervals.
In Hive 3.0 and newer, the
width_bucket()
function does that:Although you may find that you need to convert your
HHmm
time values to INTs first (e.g. number of seconds since midnight), to make it work perfectly well.