I have a PostgreSQL table of GPS points from multiple people with the information on the trip and place each of the points is assigned to. I want to find the occasion number at which each point_id is assigned to the place_id where overnight = 'Yes' across all trip_id of each person_id, based on the following conditions:
point_idassigned to the sameplace_idwhereovernight = 'Yes'within the sametrip_idshould be treated as separate occasions when there is at least one point assigned to a differentplace_idwhereovernight = 'Yes'anytime in between those points. This means revisits are counted as different occasions.point_idassigned to the sameplace_idwhereovernight = 'Yes'from differenttrip_idshould be treated as separate occasions.
Below is my table. occasion is the expected output field. Any guidance how to achieve this? Thanks!
CREATE TABLE my_table (
point_id INT,
trip_id INT,
person_id INT,
place_id INT,
overnight VARCHAR,
occasion INT
);
INSERT INTO my_table (point_id, trip_id, person_id, place_id, overnight, occasion) VALUES
(1, 5, 1, 9, 'Yes', 1),
(2, 5, 1, 10, 'Yes', 1),
(3, 5, 1, 10, 'Yes', 1),
(4, 5, 1, 9, NULL, NULL),
(5, 5, 1, 9, 'Yes', 2),
(6, 5, 1, 10, 'Yes', 2),
(7, 5, 1, 9, 'Yes', 3),
(8, 5, 1, 10, 'Yes', 3),
(9, 5, 1, 10, 'No', NULL),
(10, 5, 1, 10, 'No', NULL),
(11, 5, 1, 10, 'Yes', 3),
(12, 95, 2, NULL, NULL, NULL),
(13, 95, 2, 9, 'No', NULL),
(14, 96, 2, 9, 'Yes', 1),
(15, 96, 2, NULL, NULL, NULL),
(16, 96, 2, 9, 'Yes', 1),
(17, 97, 2, NULL, NULL, NULL),
(18, 97, 2, NULL, NULL, NULL),
(19, 97, 2, 9, 'Yes', 2),
(20, 98, 2, 9, 'Yes', 3),
(21, 98, 2, 9, 'No', NULL),
(22, 99, 2, 10, 'No', NULL),
(23, 99, 2, 9, 'Yes', 4),
(24, 99, 2, 10, 'No', NULL),
(25, 99, 2, 10, 'Yes', 1);
Edits:
Here is my current query. The output field final_visit_rank is identical to the expected output field occasion. The query seems to work fine, but I'm not sure if it's the most efficient or elegant solution, especially considering that my real dataset has hundreds of thousands of rows.
with t_revisit as (
select *,
case when place_id is not distinct from lag(place_id) over (
partition by person_id order by point_id rows between unbounded preceding and current row) and
trip_id is not distinct from lag(trip_id) over (
partition by person_id order by point_id rows between unbounded preceding and current row)
then 1
else 0
end as revisit
from my_table
where overnight = 'Yes'
),
t_visit_rank as (
select *,
row_number() over (
partition by person_id, place_id order by trip_id, point_id
rows between unbounded preceding and current row) as visit_rank
from t_revisit
where revisit = 0)
select t.*, visit_rank,
case when t.revisit = 1
then lag(visit_rank) over (
partition by t.person_id, t.place_id, t.overnight order by t.point_id
rows between unbounded preceding and current row)
else visit_rank
end as final_visit_rank
from (select my_table.*, revisit
from my_table
left join t_revisit
on my_table.point_id = t_revisit.point_id) t
left join t_visit_rank
on t.point_id = t_visit_rank.point_id
order by person_id, point_id;