I've recently been playing around with TimeScaleDB however I am a little confused and need some pointers as to why my query is running slowly or to verify if that is the typical performance of a timescaledb query.

The dataset I am using is market tick data for one particular date for which I have loaded around 84 million records into my hypertable..

Here is an example of the type of data from my file:

2018-12-03 00:00:00.000344+00:00,2181.T,2018-12-03,2179,56300,2180,59500

2018-12-03 00:00:00.000629+00:00,1570.T,2018-12-03,20470,555118,20480,483857

2018-12-03 00:00:00.000631+00:00,2002.T,2018-12-03,2403,30300,2404,30200

My table has been created like this:


CREATE TABLE tt1 (time        TIMESTAMPTZ           NOT NULL,
cusip       varchar(40)           NOT NULL,     
date        DATE                NULL,  
value       DOUBLE PRECISION,
value2      DOUBLE PRECISION,
value3      DOUBLE PRECISION,
value4      DOUBLE PRECISION);

I created two versions of the hypertable, tt1 which has 1 minute chunks and tt30m which is 30 minute chunks. Both tables follow the same schema above. I created the hypertable like so:

SELECT create_hypertable('tt1', 'time', chunk_time_interval => interval '1 minute');

The time and cusip column are indexed in both versions of the hypertable. Time by default is indexed when creating the hypertable and I've created the cusip index with the following

  CREATE INDEX ON tt1(cusip, time DESC);

My query looks like this:

EXPLAIN ANALYZE SELECT time_bucket('15 minutes', time) AS fifteen_min,
  cusip, COUNT(*)
  FROM tt1
  WHERE time > timestamp '2018-12-03 05:10:06.174704-05' - interval '3 hours'
  GROUP BY fifteen_min, cusip
  ORDER BY fifteen_min DESC;

With 30 minute chunks, the query takes 25.969 seconds. here is the query plan for it:

                                                                                 QUERY PLAN                                                                        



-------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------

 Finalize GroupAggregate  (cost=1679944.84..1685344.84 rows=40000 width=40) (actual time=25770.209..25873.410 rows=305849 loops=1)

   Group Key: (time_bucket('00:15:00'::interval, tt30m."time")), tt30m.cusip

   ->  Gather Merge  (cost=1679944.84..1684544.84 rows=40000 width=40) (actual time=25770.181..25885.080 rows=305849 loops=1)

         Workers Planned: 1

         Workers Launched: 1

         ->  Sort  (cost=1678944.83..1679044.83 rows=40000 width=40) (actual time=12880.868..12911.917 rows=152924 loops=2)

               Sort Key: (time_bucket('00:15:00'::interval, tt30m."time")) DESC, tt30m.cusip

               Sort Method: quicksort  Memory: 25kB

               Worker 0:  Sort Method: external merge  Disk: 10976kB

               ->  Partial HashAggregate  (cost=1675387.29..1675887.29 rows=40000 width=40) (actual time=12501.381..12536.373 rows=152924 loops=2)

                     Group Key: time_bucket('00:15:00'::interval, tt30m."time"), tt30m.cusip

                     ->  Parallel Custom Scan (ChunkAppend) on tt30m  (cost=10680.22..1416961.58 rows=34456761 width=32) (actual time=0.020..7293.929 rows=24255398

 loops=2)

                           Chunks excluded during startup: 14

                           ->  Parallel Seq Scan on _hyper_2_753_chunk  (cost=0.00..116011.42 rows=4366426 width=17) (actual time=0.037..1502.121 rows=7423073 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_755_chunk  (cost=0.00..108809.26 rows=4095539 width=17) (actual time=0.017..1446.248 rows=6962556 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_754_chunk  (cost=0.00..107469.27 rows=4056341 width=17) (actual time=0.015..1325.638 rows=6895917 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_756_chunk  (cost=0.00..99037.70 rows=3730381 width=17) (actual time=0.006..1206.708 rows=6341775 loops

=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_758_chunk  (cost=0.00..90757.67 rows=3421505 width=17) (actual time=0.017..1126.757 rows=5816675 loops

Time: 25968.520 ms (00:25.969)

With 1 minute chunks, the query takes 25.686 seconds. Here is the query plan:

                                                                              QUERY PLAN                                                                           



-------------------------------------------------------------------------------------------------------------------------------------------------------------------

---

 Finalize GroupAggregate  (cost=1048760.27..1054160.27 rows=40000 width=25) (actual time=25306.291..25409.778 rows=305849 loops=1)

   Group Key: (time_bucket('00:15:00'::interval, tt1."time")), tt1.cusip

   ->  Gather Merge  (cost=1048760.27..1053360.27 rows=40000 width=25) (actual time=25306.282..25424.711 rows=305849 loops=1)

         Workers Planned: 1

         Workers Launched: 1

         ->  Sort  (cost=1047760.26..1047860.26 rows=40000 width=25) (actual time=12629.859..12665.190 rows=152924 loops=2)

               Sort Key: (time_bucket('00:15:00'::interval, tt1."time")) DESC, tt1.cusip

               Sort Method: quicksort  Memory: 25kB

               Worker 0:  Sort Method: external merge  Disk: 10976kB

               ->  Partial HashAggregate  (cost=1044202.72..1044702.72 rows=40000 width=25) (actual time=12276.755..12311.071 rows=152924 loops=2)

                     Group Key: time_bucket('00:15:00'::interval, tt1."time"), tt1.cusip

                     ->  Parallel Custom Scan (ChunkAppend) on tt1  (cost=0.42..830181.18 rows=28536205 width=17) (actual time=0.013..7147.401 rows=24255398 loops=

2)

                           Chunks excluded during startup: 430

                           ->  Parallel Seq Scan on _hyper_1_564_chunk  (cost=0.00..4776.72 rows=180218 width=17) (actual time=0.022..56.440 rows=306370 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_571_chunk  (cost=0.00..4632.82 rows=174066 width=16) (actual time=0.006..55.440 rows=295912 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_553_chunk  (cost=0.00..4598.08 rows=173526 width=17) (actual time=0.019..61.084 rows=294995 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_499_chunk  (cost=0.00..4586.53 rows=172922 width=17) (actual time=0.006..64.104 rows=293968 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_498_chunk  (cost=0.00..4528.29 rows=170504 width=17) (actual time=0.005..52.295 rows=289856 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_502_chunk  (cost=0.00..4509.36 rows=169949 width=17) (actual time=0.005..53.786 rows=288913 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_1_645_chunk  (cost=0.00..4469.19 rows=168735 width=17) (actual time=0.013..55.431 rows=286850 loops=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

Essentially what I'm looking for is some pointers as to whether this is expected performance of timescaledb or if there is a way to optimize this query?

I've already ran the timescaledb-tune tool and accepted all the optimizations it suggested. I am running this on a linux vm through virtual box. The vm has 20gb of ram and 250gb+ hdd space and 2 cpus. Postgres version is 11.6 and TimeScaleDB version is 1.5.0. Attached output of dump_meta_data here: dump meta data output

Many thanks for any responses :)

1

There are 1 best solutions below

0
On

this query looks like it is going to need to scan all of the records in the 3 hour period in either case, and that's what's taking the time, there are a few options to speed that sort of thing up, one is that the virtual hardware here could be slowing down as it requires a fair amount of io and your box is rather small and is probably slowing down quite a bit for IO, so a larger box would help here. Varying the chunk size is going to have little effect, the chunk size hardly affects this sort of query and in fact I'd recommend larger chunks as 84m rows is not all that many. Another option is to use continuous aggregates to pre-compute some of that operation for you if this is the type of query you're going to be running a lot of that can save you some time and cpu/memory/io issues.