Get range between FIRST_VALUE and LAST_VALUE

421 Views Asked by At
timestamp id scope
2021-01-23 12:52:34.159999 UTC 1 enter_page
2021-01-23 12:53:02.342 UTC 1 view_product
2021-01-23 12:53:02.675 UTC 1 checkout
2021-01-23 12:53:04.342 UTC 1 search_page
2021-01-23 12:53:24.513 UTC 1 checkout

I am trying to get all the values between the FIRST_VALUE and LAST VALUE in the column 'scope' using WINDOWS/ANALYTICAL Functions

I already get the first_value() = enter_page
and the last_value() == checkout

by using windows functions in SQLite

FIRST_VALUE(scope) OVER ( PARTITION BY id ORDER BY julianday(timestamp) ASC) first_page
FIRST_VALUE(scope) OVER ( PARTITION BY id ORDER BY julianday(timestamp) DESC ) last_page

I am trying to capture all steps in between [excluding edges]: view_product, apartment_view, checkout[, N-field] to later add them into a string ( unique values -STR_AGGR() )

Once done that, i will later process trying to find if the customer at somepoint during the purchase_journey open the checkout multiple times

my result should like like

id first_page last_page inbetween_pages
1 enter_page checkout view_product, checkout, search_page

p.s. I am trying to avoid using python to process this. I would like a 'clean' way of doing this with SQL-pure

Thanks a lot guys

2

There are 2 best solutions below

1
On

Hmmm . . . I am thinking:

select id, group_concat(scope, ',')
from (select t.*,
             row_number() over (partition by id order by timestamp) as seqnum_asc,
             row_number() over (partition by id order by timestamp desc) as seqnum_desc
      from t
      order by id, timestamp
     ) t
where 1 not in (seqnum_asc, seqnum_desc)
group by id;

In SQLite, group_concat() doesn't accept an order by argument. My understanding is that it respects the ordering from the subquery, which is why the subquery has an order by.

3
On

You can do it with GROUP_CONCAT() window function which supports the ORDER BY clause so you will have the scopes in inbetween_pages in the correct order, instead of GROUP_CONCAT() aggregate function which does not support the ORDER BY clause and the results that it returns are not guaranteed to be in a specific order:

SELECT DISTINCT id, first_page, last_page,
       GROUP_CONCAT(CASE WHEN timestamp NOT IN (min_timestamp, max_timestamp) THEN scope END) 
       OVER (PARTITION BY id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) inbetween_pages
FROM (
  SELECT *,
         FIRST_VALUE(scope) OVER (PARTITION BY id ORDER BY timestamp) first_page,
         FIRST_VALUE(scope) OVER (PARTITION BY id ORDER BY timestamp DESC) last_page,
         MIN(timestamp) OVER (PARTITION BY id) min_timestamp,
         MAX(timestamp) OVER (PARTITION BY id) max_timestamp
  FROM tablename       
)

See the demo.
Results:

id first_page last_page inbetween_pages
1 enter_page checkout view_product,checkout,search_page