SQL (Vertica) - Calculate number of users who returned to the app at least x days in the past 7 days

299 Views Asked by At

Suppose I have my table like:

uid  day_used_app   
---  -------------
1    2012-04-28      
1    2012-04-29        
1    2012-04-30        
2    2012-04-29       
2    2012-04-30 
2    2012-05-01       
2    2012-05-21        
2    2012-05-22   

Suppose I want the number of unique users who returned to the app at least 2 different days in the last 7 days (from 2012-05-03).

So as an example to retrieve the number of users who have used the application on at least 2 different days in the past 7 days:

select count(distinct case when num_different_days_on_app >= 2
                           then uid else null end) as users_return_2_or_more_days

    from (
         select uid,
                count(distinct day_used_app) as num_different_days_on_app
             from table
         where day_used_app between current_date() - 7 and current_date()
         group by 1

        )

This gives me:

users_return_2_or_more_days
---------------------------
            2

The question I have is:

What if I want to do this for every day up to now so that my table looks like this, where the second field equals the number of unique users who returned 2 or more different days within a week prior to the date in the first field.

        date               users_return_2_or_more_days
      --------             ---------------------------
    2012-04-28                        2
    2012-04-29                        2 
    2012-04-30                        3           
    2012-05-01                        4     
    2012-05-02                        4       
    2012-05-03                        3
2

There are 2 best solutions below

4
On
SELECT DISTINCT
    t1.day_used_app,
    (
        SELECT SUM(CASE WHEN t.num_visits >= 2 THEN 1 ELSE 0 END)
        FROM
        (
            SELECT uid,
                   COUNT(DISTINCT day_used_app) AS num_visits
            FROM table
            WHERE day_used_app BETWEEN t1.day_used_app - 7 AND t1.day_used_app
            GROUP BY uid
        ) t
   ) AS users_return_2_or_more_days
FROM table t1
1
On

Would this help?

WITH
-- your original input, don't use in "real" query ...
input(uid,day_used_app) AS (
          SELECT 1,DATE '2012-04-28'
UNION ALL SELECT 1,DATE '2012-04-29'
UNION ALL SELECT 1,DATE '2012-04-30'
UNION ALL SELECT 2,DATE '2012-04-29'
UNION ALL SELECT 2,DATE '2012-04-30'
UNION ALL SELECT 2,DATE '2012-05-01'
UNION ALL SELECT 2,DATE '2012-05-21'
UNION ALL SELECT 2,DATE '2012-05-22'
)
-- end of input, start "real" query here, replace ',' with 'WITH'
,
one_week_b4 AS (
  SELECT
    uid
  , day_used_app
  , day_used_app -7 AS day_used_1week_b4
  FROM input
)
SELECT
  one_week_b4.uid
, one_week_b4.day_used_app
, count(*) AS users_return_2_or_more_days
FROM one_week_b4
JOIN input
  ON input.day_used_app BETWEEN one_week_b4.day_used_1week_b4 AND one_week_b4.day_used_app
GROUP BY
  one_week_b4.uid
, one_week_b4.day_used_app
HAVING count(*) >= 2
ORDER BY 1;

Output is:

uid|day_used_app|users_return_2_or_more_days
  1|2012-04-29  |                          3
  1|2012-04-30  |                          5
  2|2012-04-29  |                          3
  2|2012-04-30  |                          5
  2|2012-05-01  |                          6
  2|2012-05-22  |                          2

Does that help your needs?

Marco the Sane ...