I am trying to write a HiveQL
(or even ANSI SQL
) query that can answer the following:
I have a database with events (transactions), each event is stored with a user_id
and a trans_time
(transaction time). A user_id
can have unlimited events, even possible to have multiple events with the same trans_time
.
If I establish an activity window of time (i.e. 10 seconds), how many windows had how many events within that window? I'm thinking the result would be something like this (obviously without the words, just numbers)...
100 windows had only 1 event
50 windows had 2 events
.
.
1 window that had 30 events
To describe the first result...There were 100 instances in the data set where only 1 event took place within the 10 second window.
Is there a way to pull another metric out of the same data that says,
50 user_ids had only 1 event
25 user_ids had 2 events
.
.
1 user_id had 30 events
To describe the first result...There were 50 user_ids that had only 1 event within the 10 second window.
I hope this is not too nebulous. As always, thank you!!
if your windows are static, for example :
window 1 - 12:00:00 - 12:00:09
window 2 - 12:00:10 - 12:00:19 etc.
You can do this :
Note the "10" in the trans_time "rounding". you can change it to any size of windows you need.