I have to build a hive SQL query for the following requirement
I have an customer table. I need to divide the total table record by 6 (i.e suppose table contain 600 records each month 100 record upto 6 month), each month there is an bracket to target a customer. suppose the bracket limit is 4 then I need to pick 5 unique email id from 5 unique account. incase 10 then 10 unique email id from 10 unique account
Note: I am using mod operation to distribute the record for 6 month.
| Account | Mod,6 | |
|---|---|---|
| acc 1 | email@acc1 | 1 |
| acc2 | email1@acc2 | 1 |
| acc2 | email2@acc2 | 2 |
| acc2 | email3@acc2 | 3 |
| acc2 | email4@acc2 | 4 |
| acc2 | email5@acc2 | 5 |
| acc2 | email6@acc2 | 6 |
| acc2 | email7@acc2 | 1 |
| acc3 | email1@acc3 | 1 |
| acc3 | email2@acc3 | 2 |
| acc3 | email3@acc3 | 3 |
| acc4 | email@acc4 | 1 |
| acc5 | email1@acc5 | 1 |
| acc5 | email2@acc5 | 2 |
Expected output - Bracket is 4 ( below output acc5 is not required as the record count already reached bracket range-4)
| Account | mod,6 | |
|---|---|---|
| acc 1 | email@acc1 | 1 |
| acc2 | email1@acc2 | 1 |
| acc3 | email1@acc3 | 1 |
| acc4 | email@acc4 | 1 |
if Bracket is 8 ( I have to pick the all the unique account first and then other sequence to reach the bracket range )
Expected output
| Account | mod,6 | |
|---|---|---|
| acc 1 | email@acc1 | 1 |
| acc2 | email1@acc2 | 1 |
| acc3 | email1@acc3 | 1 |
| acc4 | email@acc4 | 1 |
| acc5 | email1@acc5 | 1 |
| acc2 | email7@acc2 | 1 |
| acc2 | email2@acc2 | 2 |
| acc3 | email2@acc3 | 2 |
if Bracket is 10
| Account | mod,6 | |
|---|---|---|
| acc 1 | email@acc1 | 1 |
| acc2 | email1@acc2 | 1 |
| acc3 | email1@acc3 | 1 |
| acc4 | email@acc4 | 1 |
| acc5 | email1@acc5 | 1 |
| acc2 | email7@acc2 | 1 |
| acc2 | email2@acc2 | 2 |
| acc3 | email2@acc3 | 2 |
| acc5 | email2@acc5 | 2 |
| acc2 | email3@acc2 | 3 |
I tried below query. But it fetch all the 1 record first. I am not sure how to fetch unique account record with mod_seq_value 1 first and then start the remain records from the mod seq -1.
select * from (
select *, Row_number() over(order by mod_num_seq,acc_count) as rnk
select account,email,
count(*) over(partition by account) as acc_count
,case
when mod(row_number() over(partition by account),6)=0 then 6
else mod(row_number() over(partition by account),6)=0
end as mod_num_seq
from
customer
)a
)b where rnk<={:bracket}
Not sure why 6th and 7th row in your output are
email7@acc2, email2@acc2instead ofemail7@acc2, email2@acc3.Result: