Speedup of Time range Queries

43 Views Asked by At

Situation:

Table Waypoints is relatively large (150M rows) and contains (amongst others) the columns user_id and a timestamp tracked_at.

Table Storyline is a smaller Table (300K rows) that contains information about time episodes (t_min to t_max, also available through the tsrange segment_time), user_id and more.

Aim: Add the information of the segments table to the points table where the user_id matches and timestamp is in t_range (there is one unique match).

EXPLAIN  CREATE TABLE t4 AS SELECT
  w.waypoint_id,
  s.storyline_id
FROM waypoints w JOIN storyline s
ON  w.tracked_at <@ s.segment_time and w.user_id = s.user_id;

Question: How can I use indexes here to speed up the join?

What I thoght would be necessary here, according to this is a btree_gist index on (user_id, t_segment_time) on storyline or, if I use a query with t_min and t_max instead of the range, a normal index on (user_id, t_min, t_max DESC). However, I always get a query plan similar to the following:

"Hash Join  (cost=329264.29..8085010946.88 rows=553075874 width=16)"
"  Hash Cond: (w.user_id = s.user_id)"
"  Join Filter: (w.tracked_at <@ s.segment_time)"
"  ->  Seq Scan on waypoints w  (cost=0.00..4503125.76 rows=152813376 width=24)"
"  ->  Hash  (cost=323059.13..323059.13 rows=496413 width=38)"
"        ->  Seq Scan on storyline s  (cost=0.00..323059.13 rows=496413 width=38)"

I understand the sequential scan on waypoints (after all, i want all points back), but I feel as if an index scan rather than a Sequential scan on Storyline should be much faster. Is there a reason why a sequential scan should be faster here? If not, what indices are needed to achieve an index scan?

Edit: As recommended by a_horse_with_no_name, here are the create statements of the tables:

CREATE TABLE public.storyline
(
  storyline_id bigint NOT NULL,
  user_id bigint NOT NULL,
  t_min timestamp without time zone NOT NULL,
  t_max timestamp without time zone NOT NULL,
  ...
  segment_time tsrange,  -- as tsrange(t_min, t_max, '[)')
  CONSTRAINT storyline_pkey PRIMARY KEY (storyline_id)
)

CREATE INDEX i_storyline_uid_t
  ON public.storyline
  USING btree
  (user_id, t_min, t_max);

CREATE INDEX i_storyline_uid_t_2
  ON public.storyline
  USING btree
  (user_id, t_min, t_max DESC);

CREATE INDEX i_storyline_uid_tseg
  ON public.storyline
  USING gist
  (user_id, segment_time);

And the waypoints:

CREATE TABLE public.waypoints
(
  waypoint_id bigint NOT NULL,
  user_id bigint,
  tracked_at timestamp without time zone,
  ...
CONSTRAINT "none" PRIMARY KEY (waypoint_id)
)

CREATE INDEX i_waypoints_id
  ON public.waypoints
  USING btree
  (waypoint_id);

CREATE INDEX i_waypoints_uid_t
  ON public.waypoints
  USING btree
  (user_id, tracked_at);

Aso to the explain analyze: I would love to do that but so far I lost patience after waiting for more than a day. I assumed that if a simple traversal of the big table lasts 2-3 minutes then adding unique counterparts from a much smaller other table should be feasible in a couple of hours at most. But I can let it run for longer and let you know the results.

0

There are 0 best solutions below