Assign ranking to IPs based on frequency, within rolling windows

Question

Assign ranking to IPs based on frequency, within rolling windows

78 Views Asked by Stanleyrr At 29 July 2025 at 06:21

Within each ID, I'm trying to rank each IP based on how often they show up within the last 3 months. For example, in the table below, the IP 98.28.88.148 should have a ranking of 1 because they showed up the most within the last 3 months. IP 166.194.154.19 should have rank of 2, and IP 64.227.19.120 should have rank of 3 since it's least frequent. If 2 IPs show up the same number of times, they should be assigned the same Rank.

ID	Login_TS_Three_Months_Prior	Login_TS	IP	Count_of_Logins_Three_Months_Prior
1234	2023-08-04T13:40:08.512Z	2023-11-02T13:40:08.512Z	64.227.19.120	0
1234	2023-08-04T14:35:46.717Z	2023-11-02T14:35:46.717Z	64.227.19.120	1
1234	2023-08-06T00:56:25.895Z	2023-11-04T00:56:25.895Z	98.28.88.148	0
1234	2023-08-10T15:50:21.845Z	2023-11-08T15:50:21.845Z	98.28.88.148	1
1234	2023-08-11T17:07:28.623Z	2023-11-09T17:07:28.623Z	98.28.88.148	2
1234	2023-08-12T16:52:18.59Z	2023-11-10T16:52:18.59Z	98.28.88.148	3
1234	2023-08-18T23:17:12.471Z	2023-11-16T23:17:12.471Z	166.194.154.19	0
1234	2023-08-22T20:42:57.979Z	2023-11-20T20:42:57.979Z	166.194.154.19	1
1234	2023-08-25T22:36:52.638Z	2023-11-23T22:36:52.638Z	166.194.154.19	2

The ideal output should look like this:

ID	Login_TS_Three_Months_Prior	Login_TS	IP	Count_of_Logins_Three_Months_Prior	IP_Ranking
1234	2023-08-04T13:40:08.512Z	2023-11-02T13:40:08.512Z	64.227.19.120	0	3
1234	2023-08-04T14:35:46.717Z	2023-11-02T14:35:46.717Z	64.227.19.120	1	3
1234	2023-08-06T00:56:25.895Z	2023-11-04T00:56:25.895Z	98.28.88.148	0	1
1234	2023-08-10T15:50:21.845Z	2023-11-08T15:50:21.845Z	98.28.88.148	1	1
1234	2023-08-11T17:07:28.623Z	2023-11-09T17:07:28.623Z	98.28.88.148	2	1
1234	2023-08-12T16:52:18.59Z	2023-11-10T16:52:18.59Z	98.28.88.148	3	1
1234	2023-08-18T23:17:12.471Z	2023-11-16T23:17:12.471Z	166.194.154.19	0	2
1234	2023-08-22T20:42:57.979Z	2023-11-20T20:42:57.979Z	166.194.154.19	1	2
1234	2023-08-25T22:36:52.638Z	2023-11-23T22:36:52.638Z	166.194.154.19	2	2

Original Q&A

There are 1 best solutions below

**Simeon Pilgrim** · Accepted Answer

if we get your data via a CTE, only keeping the id, timestamp, & ip address (and adding second batch with a draw in it):

with question_data as (
    select 
        $1 as id
        ,try_to_timestamp_ntz($2, 'yyyy-mm-dd"T"hh24:mi:ss.ff"Z"') as login_ts
        ,$3 as ip
    from values
    (1234, '2023-11-02T13:40:08.512Z', '64.227.19.120'),
    (1234, '2023-11-02T14:35:46.717Z', '64.227.19.120'),
    (1234, '2023-11-04T00:56:25.895Z', '98.28.88.148'),
    (1234, '2023-11-08T15:50:21.845Z', '98.28.88.148'),
    (1234, '2023-11-09T17:07:28.623Z', '98.28.88.148'),
    (1234, '2023-11-10T16:52:18.59Z', '98.28.88.148'),
    (1234, '2023-11-16T23:17:12.471Z', '166.194.154.19'),
    (1234, '2023-11-20T20:42:57.979Z', '166.194.154.19'),
    (1234, '2023-11-23T22:36:52.638Z', '166.194.154.19'),

    (1235, '2023-11-02T13:40:08.512Z', '64.227.19.120'),
    (1235, '2023-11-02T14:35:46.717Z', '64.227.19.120'),
    (1235, '2023-11-04T00:56:25.895Z', '98.28.88.148'),
    (1235, '2023-11-08T15:50:21.845Z', '98.28.88.148'),
    (1235, '2023-11-09T17:07:28.623Z', '98.28.88.148'),
    (1235, '2023-11-10T16:52:18.59Z', '98.28.88.148'),
    (1235, '2023-11-20T20:42:57.979Z', '166.194.154.19'),
    (1235, '2023-11-23T22:36:52.638Z', '166.194.154.19')
)

we first want to do the count (well really we might want to do the filter, but I assume you have this bit sorted, so we will can count). Using a window function form of COUNT:

    select *, 
        count(*)over(partition by id, ip) as ip_count
    from question_data

those id/ip batches counted.

now for ranking there are three forms, ROW_NUMBER, DENSE_RANK and RANK. The first gives every item it's own number, the second gives duplicates equal numbers, and increases by 1, the third gives equals the same, but the 100th RANK item will have only 100 values prior to it (this is like the Olympic's a draw for first gives 1,1,3), it sounds like what you want is the second.

Now given we are going to run this in a window function fashion again, it needs the prior window function for the count, to be at a different query layer (a sub-select), thus some code show the second/third options:

select d.*
    ,dense_rank() over(partition by d.id order by ip_count desc) as ip_ranking_d
    ,rank() over(partition by d.id order by ip_count desc) as ip_ranking_s
from (
    select *, 
        count(*)over(partition by id, ip) as ip_count
    from question_data
) as d
order by d.id, d.ip, d.login_ts;

here we see in the 1235 id batch, the two IP's with a count of 2 are 2nd equal. Now the question becomes what do you want to happen to the next count, if you want that to be 3, this method works just fine. But if you want the next number after duplicates to be 4, we need a method of allocating sparse values, and join those results.

so if the above code is what you want, then the clean form looks like:

select d.* exclude(ip_count)
    ,dense_rank() over(partition by d.id order by ip_count desc) as ip_ranking_d
from (
    select *, 
        count(*)over(partition by id, ip) as ip_count
    from question_data
) as d

but if we want the shared count, but correctly counted gaps:

select *
from question_data
natural join (
    select 
        id,
        ip,
        rank() over (partition by id order by ip_count desc) as ip_rank
    from (
        select         
            id,
            ip,
            count(*) as ip_count
        from question_data
        group by 1,2
    )
)
order by 1,2,3;

so with an extra row of data:

    (1235, '2023-11-23T22:36:52.638Z', '166.194.154.18')

the dense form allocation 3:

where-as the sparse version with the join allocates 4:

with more time consideration:

so the last 3 months of data per row starts with this:

with question_data as (
    select 
        $1 as id
        ,try_to_timestamp_ntz($2, 'yyyy-mm-dd"T"hh24:mi:ss.ff"Z"') as login_ts
        ,$3 as ip
    from values
    (1234, '2023-11-02T00:00:00.000Z', 'a'),
    (1234, '2023-10-02T00:00:00.000Z', 'a'),
    (1234, '2023-09-02T00:00:00.000Z', 'a'),
    (1234, '2023-08-02T00:00:00.000Z', 'a'),

    (1234, '2023-11-16T00:00:00.000Z', 'b'),
    (1234, '2023-10-16T00:00:00.000Z', 'b'),
    (1234, '2023-09-16T00:00:00.000Z', 'b'),
    (1234, '2023-08-16T00:00:00.000Z', 'b')
), enrich as (
    select *, seq8() as s
    from question_data
)
select a.*
    ,b.ip as b_ip
    ,b.s as b_s
    ,b.login_ts as b_login_ts
from enrich as a
left join enrich as b
    on a.id = b.id 
    and b.login_ts between dateadd('month', -3, a.login_ts) and a.login_ts
order by 4,6;

then we want count these rows:

), three_months as (
    select a.*
        ,b.ip as b_ip
        ,b.s as b_s
        ,b.login_ts as b_login_ts
    from enrich as a
    left join enrich as b
        on a.id = b.id 
        and b.login_ts between dateadd('month', -3, a.login_ts) and a.login_ts
)
select id, 
    login_ts, 
    ip, 
    s, -- does not seem to be needed anymore.. not sure where I was going with that
    b_ip, 
    count(*) as b_ip_count
from three_months
group by 1,2,3,4,5

then we want to rank those:

), counted as (
    select id, 
        login_ts, 
        ip, 
        s, -- s is used for the ranking partition..
        b_ip, 
        count(*) as b_ip_count
    from three_months
    group by 1,2,3,4,5
)
select *
    ,rank() over (partition by id, s order by b_ip_count desc) as sparse_rank
    ,dense_rank() over (partition by id, s order by b_ip_count desc) as dense_rank   
from counted
order by 1,4,5;

now we want to throw away the lines, where ip <> b_ip as those are the other ranked lines:

thus we now have the three month window from this row, ranked:

with question_data as (
    select 
        $1 as id
        ,try_to_timestamp_ntz($2, 'yyyy-mm-dd"T"hh24:mi:ss.ff"Z"') as login_ts
        ,$3 as ip
    from values
    (1234, '2023-11-02T00:00:00.000Z', 'a'),
    (1234, '2023-10-02T00:00:00.000Z', 'a'),
    (1234, '2023-09-02T00:00:00.000Z', 'a'),
    (1234, '2023-08-02T00:00:00.000Z', 'a'),

    (1234, '2023-11-16T00:00:00.000Z', 'b'),
    (1234, '2023-10-16T00:00:00.000Z', 'b'),
    (1234, '2023-09-16T00:00:00.000Z', 'b'),
    (1234, '2023-08-16T00:00:00.000Z', 'b')
), enrich as (
    select *, seq8() as s
    from question_data
), three_months as (
    select a.*
        ,b.ip as b_ip
        ,b.s as b_s
        ,b.login_ts as b_login_ts
    from enrich as a
    left join enrich as b
        on a.id = b.id 
        and b.login_ts between dateadd('month', -3, a.login_ts) and a.login_ts
), counted as (
    select id, 
        login_ts, 
        ip, 
        s, -- s is used for the ranking partition..
        b_ip, 
        count(*) as b_ip_count
    from three_months
    group by 1,2,3,4,5
)
select c.* exclude (s, b_ip, b_ip_count)
    ,rank() over (partition by id, s order by b_ip_count desc) as sparse_rank
    ,dense_rank() over (partition by id, s order by b_ip_count desc) as dense_rank   
from counted as c
qualify ip = b_ip
order by 1,4,5;

giving:

Now, clearly for your data, it will not all be ones, and you still need to pick dense or sparse. But this code seem to be what you are wanting.

Assign ranking to IPs based on frequency, within rolling windows

There are 1 best solutions below

with more time consideration:

Related Questions in SQL

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in RANK

Related Questions in DENSE-RANK

Trending Questions

Popular # Hahtags

Popular Questions