How to simulate BigQuery's quantiles in Hive

730 Views Asked by At

I want to simulate BigQuery's QUANTILES function in Hive.

Data set: 1,2,3,4

BigQuery's query result will return value 2

select nth(2, quantiles(col1, 3))

But in Hive:

select percentile(col1, 0.5)

I've got 2.5

Note: I've got same result for odd number of records.

Is there any adequate Hive's udf functions?

1

There are 1 best solutions below

1
On

I guess what you are looking for is the percentile_approx UDF.

This page gives you the list of all built-in UDFs in Hive.

percentile_approx(DOUBLE col, p [, B])

Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000. When the number of distinct values in col is smaller than B, this gives an exact percentile value.