Situation:
Table Waypoints is relatively large (150M rows) and contains (amongst others) the columns user_id and a timestamp tracked_at.
Table Storyline is a smaller Table (300K rows) that contains information about time episodes (t_min to t_max, also available through the tsrange segment_time), user_id and more.
Aim: Add the information of the segments table to the points table where the user_id matches and timestamp is in t_range (there is one unique match).
EXPLAIN CREATE TABLE t4 AS SELECT
w.waypoint_id,
s.storyline_id
FROM waypoints w JOIN storyline s
ON w.tracked_at <@ s.segment_time and w.user_id = s.user_id;
Question: How can I use indexes here to speed up the join?
What I thoght would be necessary here, according to this is a btree_gist index on (user_id, t_segment_time) on storyline or, if I use a query with t_min and t_max instead of the range, a normal index on (user_id, t_min, t_max DESC). However, I always get a query plan similar to the following:
"Hash Join (cost=329264.29..8085010946.88 rows=553075874 width=16)"
" Hash Cond: (w.user_id = s.user_id)"
" Join Filter: (w.tracked_at <@ s.segment_time)"
" -> Seq Scan on waypoints w (cost=0.00..4503125.76 rows=152813376 width=24)"
" -> Hash (cost=323059.13..323059.13 rows=496413 width=38)"
" -> Seq Scan on storyline s (cost=0.00..323059.13 rows=496413 width=38)"
I understand the sequential scan on waypoints (after all, i want all points back), but I feel as if an index scan rather than a Sequential scan on Storyline should be much faster. Is there a reason why a sequential scan should be faster here? If not, what indices are needed to achieve an index scan?
Edit: As recommended by a_horse_with_no_name, here are the create statements of the tables:
CREATE TABLE public.storyline
(
storyline_id bigint NOT NULL,
user_id bigint NOT NULL,
t_min timestamp without time zone NOT NULL,
t_max timestamp without time zone NOT NULL,
...
segment_time tsrange, -- as tsrange(t_min, t_max, '[)')
CONSTRAINT storyline_pkey PRIMARY KEY (storyline_id)
)
CREATE INDEX i_storyline_uid_t
ON public.storyline
USING btree
(user_id, t_min, t_max);
CREATE INDEX i_storyline_uid_t_2
ON public.storyline
USING btree
(user_id, t_min, t_max DESC);
CREATE INDEX i_storyline_uid_tseg
ON public.storyline
USING gist
(user_id, segment_time);
And the waypoints:
CREATE TABLE public.waypoints
(
waypoint_id bigint NOT NULL,
user_id bigint,
tracked_at timestamp without time zone,
...
CONSTRAINT "none" PRIMARY KEY (waypoint_id)
)
CREATE INDEX i_waypoints_id
ON public.waypoints
USING btree
(waypoint_id);
CREATE INDEX i_waypoints_uid_t
ON public.waypoints
USING btree
(user_id, tracked_at);
Aso to the explain analyze: I would love to do that but so far I lost patience after waiting for more than a day. I assumed that if a simple traversal of the big table lasts 2-3 minutes then adding unique counterparts from a much smaller other table should be feasible in a couple of hours at most. But I can let it run for longer and let you know the results.