SQL: Assigning occasion numbers to GPS points when visiting places based on conditions

59 Views Asked by At

I have a PostgreSQL table of GPS points from multiple people with the information on the trip and place each of the points is assigned to. I want to find the occasion number at which each point_id is assigned to the place_id where overnight = 'Yes' across all trip_id of each person_id, based on the following conditions:

  1. point_id assigned to the same place_id where overnight = 'Yes' within the same trip_id should be treated as separate occasions when there is at least one point assigned to a different place_id where overnight = 'Yes' anytime in between those points. This means revisits are counted as different occasions.

  2. point_id assigned to the same place_id where overnight = 'Yes' from different trip_id should be treated as separate occasions.

Below is my table. occasion is the expected output field. Any guidance how to achieve this? Thanks!

CREATE TABLE my_table (
    point_id INT,
    trip_id INT,
    person_id INT,
    place_id INT,
    overnight VARCHAR,
    occasion INT
);

INSERT INTO my_table (point_id, trip_id, person_id, place_id, overnight, occasion) VALUES
(1, 5, 1, 9, 'Yes', 1),
(2, 5, 1, 10, 'Yes', 1),
(3, 5, 1, 10, 'Yes', 1),
(4, 5, 1, 9, NULL, NULL),
(5, 5, 1, 9, 'Yes', 2),
(6, 5, 1, 10, 'Yes', 2),
(7, 5, 1, 9, 'Yes', 3),
(8, 5, 1, 10, 'Yes', 3),
(9, 5, 1, 10, 'No', NULL),
(10, 5, 1, 10, 'No', NULL),
(11, 5, 1, 10, 'Yes', 3),
(12, 95, 2, NULL, NULL, NULL),
(13, 95, 2, 9, 'No', NULL),
(14, 96, 2, 9, 'Yes', 1),
(15, 96, 2, NULL, NULL, NULL),
(16, 96, 2, 9, 'Yes', 1),
(17, 97, 2, NULL, NULL, NULL),
(18, 97, 2, NULL, NULL, NULL),
(19, 97, 2, 9, 'Yes', 2),
(20, 98, 2, 9, 'Yes', 3),
(21, 98, 2, 9, 'No', NULL),
(22, 99, 2, 10, 'No', NULL),
(23, 99, 2, 9, 'Yes', 4),
(24, 99, 2, 10, 'No', NULL),
(25, 99, 2, 10, 'Yes', 1);

Edits:

Here is my current query. The output field final_visit_rank is identical to the expected output field occasion. The query seems to work fine, but I'm not sure if it's the most efficient or elegant solution, especially considering that my real dataset has hundreds of thousands of rows.

with t_revisit as (
select  *, 
        case when place_id is not distinct from lag(place_id) over (
            partition by person_id order by point_id rows between unbounded preceding and current row) and 
                trip_id is not distinct from lag(trip_id) over (
                    partition by person_id order by point_id rows between unbounded preceding and current row)
                        then 1
             else 0
        end as revisit
from my_table
where overnight = 'Yes'
), 

t_visit_rank as (
select  *,
        row_number() over (
            partition by person_id, place_id order by trip_id, point_id 
                rows between unbounded preceding and current row) as visit_rank
from t_revisit
where revisit = 0)

select  t.*, visit_rank, 
        case when t.revisit = 1 
                then lag(visit_rank) over (
                    partition by t.person_id, t.place_id, t.overnight order by t.point_id 
                        rows between unbounded preceding and current row)
            else visit_rank
        end as final_visit_rank
from (select my_table.*, revisit
      from my_table
      left join t_revisit
      on my_table.point_id = t_revisit.point_id) t
left join t_visit_rank
on t.point_id = t_visit_rank.point_id
order by person_id, point_id; 
0

There are 0 best solutions below