Within each ID, I'm trying to rank each IP based on how often they show up within the last 3 months. For example, in the table below, the IP 98.28.88.148 should have a ranking of 1 because they showed up the most within the last 3 months. IP 166.194.154.19 should have rank of 2, and IP 64.227.19.120 should have rank of 3 since it's least frequent. If 2 IPs show up the same number of times, they should be assigned the same Rank.
ID | Login_TS_Three_Months_Prior | Login_TS | IP | Count_of_Logins_Three_Months_Prior |
---|---|---|---|---|
1234 | 2023-08-04T13:40:08.512Z | 2023-11-02T13:40:08.512Z | 64.227.19.120 | 0 |
1234 | 2023-08-04T14:35:46.717Z | 2023-11-02T14:35:46.717Z | 64.227.19.120 | 1 |
1234 | 2023-08-06T00:56:25.895Z | 2023-11-04T00:56:25.895Z | 98.28.88.148 | 0 |
1234 | 2023-08-10T15:50:21.845Z | 2023-11-08T15:50:21.845Z | 98.28.88.148 | 1 |
1234 | 2023-08-11T17:07:28.623Z | 2023-11-09T17:07:28.623Z | 98.28.88.148 | 2 |
1234 | 2023-08-12T16:52:18.59Z | 2023-11-10T16:52:18.59Z | 98.28.88.148 | 3 |
1234 | 2023-08-18T23:17:12.471Z | 2023-11-16T23:17:12.471Z | 166.194.154.19 | 0 |
1234 | 2023-08-22T20:42:57.979Z | 2023-11-20T20:42:57.979Z | 166.194.154.19 | 1 |
1234 | 2023-08-25T22:36:52.638Z | 2023-11-23T22:36:52.638Z | 166.194.154.19 | 2 |
The ideal output should look like this:
ID | Login_TS_Three_Months_Prior | Login_TS | IP | Count_of_Logins_Three_Months_Prior | IP_Ranking |
---|---|---|---|---|---|
1234 | 2023-08-04T13:40:08.512Z | 2023-11-02T13:40:08.512Z | 64.227.19.120 | 0 | 3 |
1234 | 2023-08-04T14:35:46.717Z | 2023-11-02T14:35:46.717Z | 64.227.19.120 | 1 | 3 |
1234 | 2023-08-06T00:56:25.895Z | 2023-11-04T00:56:25.895Z | 98.28.88.148 | 0 | 1 |
1234 | 2023-08-10T15:50:21.845Z | 2023-11-08T15:50:21.845Z | 98.28.88.148 | 1 | 1 |
1234 | 2023-08-11T17:07:28.623Z | 2023-11-09T17:07:28.623Z | 98.28.88.148 | 2 | 1 |
1234 | 2023-08-12T16:52:18.59Z | 2023-11-10T16:52:18.59Z | 98.28.88.148 | 3 | 1 |
1234 | 2023-08-18T23:17:12.471Z | 2023-11-16T23:17:12.471Z | 166.194.154.19 | 0 | 2 |
1234 | 2023-08-22T20:42:57.979Z | 2023-11-20T20:42:57.979Z | 166.194.154.19 | 1 | 2 |
1234 | 2023-08-25T22:36:52.638Z | 2023-11-23T22:36:52.638Z | 166.194.154.19 | 2 | 2 |
if we get your data via a CTE, only keeping the id, timestamp, & ip address (and adding second batch with a draw in it):
we first want to do the count (well really we might want to do the filter, but I assume you have this bit sorted, so we will can count). Using a window function form of COUNT:
those id/ip batches counted.
now for ranking there are three forms, ROW_NUMBER, DENSE_RANK and RANK. The first gives every item it's own number, the second gives duplicates equal numbers, and increases by 1, the third gives equals the same, but the 100th RANK item will have only 100 values prior to it (this is like the Olympic's a draw for first gives 1,1,3), it sounds like what you want is the second.
Now given we are going to run this in a window function fashion again, it needs the prior window function for the count, to be at a different query layer (a sub-select), thus some code show the second/third options:
here we see in the 1235 id batch, the two IP's with a count of 2 are 2nd equal. Now the question becomes what do you want to happen to the next count, if you want that to be 3, this method works just fine. But if you want the next number after duplicates to be 4, we need a method of allocating sparse values, and join those results.
so if the above code is what you want, then the clean form looks like:
but if we want the shared count, but correctly counted gaps:
so with an extra row of data:
the dense form allocation 3:
where-as the sparse version with the join allocates 4:
with more time consideration:
so the last 3 months of data per row starts with this:
then we want count these rows:
then we want to rank those:
now we want to throw away the lines, where ip <> b_ip as those are the other ranked lines:
thus we now have the three month window from this row, ranked:
giving:
Now, clearly for your data, it will not all be ones, and you still need to pick dense or sparse. But this code seem to be what you are wanting.