Hive Query: Selecting column over a partition based on a median of a different column

187 Views Asked by At

I need help in modeling a query as I am unable to do it.

My data is:

id   name   school   height
1    A      S1       10
2    B      S1       12
3    C      S1       14
4    D      S2       15
5    E      S2       16
6    F      S2       17

I want to select the name and the name with median height per school.

Expected output:

id   name  school  myval
1    A    S1    B
2    B    S1    B
3    C    S1    B
4    D    S2    E
5    E    S2    E
6    F    S2    E

Here, person B has the median height in school S1 and E has in S2.

I know we can get median using percentile. But I am not able to figure out how to select the value per partition basis.

2

There are 2 best solutions below

1
On

This gives the median column

select a.id,a.name,a.school,a.height, b.median from your_table a join (select school, CAST(percentile(CAST(height as BIGINT),0.5) as INT) as median from your_table group by school) b on a.school = b.school;
1
On

below query will work :-

select 
  temp1.id,
  temp1.name,
  temp1.school,
  temp2.name 
from 
  (select 
     id,
     name,
     school,
     height 
  from 
     TABLE_NAME
  ) temp1
  left Join        
   (select 
      school,
      name 
    from 
      (select 
        id,
        name,
        school,
        height,
        SUM(height) OVER 
           (PARTITION BY school)/COUNT(height) OVER 
               (PARTITION BY school) as avg 
      from 
        TABLE_NAME) AVERG 
   where height=avg ) temp2 on temp1.school=temp2.school ;