Query aggregated data with a given sampling time

500 Views Asked by At

Suppose my raw data is:

Timestamp   High Low Volume
10:24.22345 100   99  10
10:24.23345 110  97   20
10:24.33455 97    89  40
10:25.33455 60    40  50
10:25.93455 40    20  60

With a sample time of 1 second, the output data should be as following (add additional column):

Timestamp   High Low Volume Count
10:24       110   89  70     3
10:25       60    20  110    2

The sampling unit from varying from 1 second, 5 sec, 1 minute, 1 hour, 1 day, ...

How to query the sampled data in quick time in the PostgreSQL database with Rails?

I want to fill all the interval by getting the error

ERROR:  JOIN/USING types bigint and timestamp without time zone cannot be matched

SQL

SELECT 
       t.high,
       t.low
FROM 
(

  SELECT generate_series(
    date_trunc('second', min(ticktime)) ,
    date_trunc('second', max(ticktime)) ,
    interval '1 sec'
  ) FROM czces  AS g (time)

  LEFT JOIN
  (
    SELECT 
      date_trunc('second', ticktime) AS time ,
      max(last_price) OVER w AS high ,
      min(last_price) OVER w AS low 
   FROM czces
   WHERE product_type ='TA' AND contract_month = '2014-08-01 00:00:00'::TIMESTAMP 
     WINDOW w AS (
      PARTITION BY date_trunc('second', ticktime)
      ORDER BY ticktime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
      )
  ) t USING (time)



  ORDER BY 1 
) AS t ;
1

There are 1 best solutions below

2
On

Simply use date_trunc() before you aggregate. Works for basic time units 1 second, 1 minute, 1 hour, 1 day - but not for 5 sec. Arbitrary intervals are slightly more complex, see link below!

SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
     , max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

If there are no rows for a sample point, you get no row in the result. If you need one row for every sample point:

SELECT g.timestamp, t.high, t.low, t.volume, t.ct
FROM  (SELECT generate_series(date_trunc('second', min(timestamp))
                             ,date_trunc('second', max(timestamp))
                             ,interval '1 sec') AS g (timestamp) -- or minute ...
LEFT JOIN  (
   SELECT date_trunc('second', timestamp) AS timestamp           -- or minute ...
        , max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
   FROM   tbl
   GROUP  BY 1
   ) t USING (timestamp)
ORDER  BY 1;

The LEFT JOIN is essential.

For arbitrary intervals:

Aside: Don't use timestamp as column name. It's a basic type name and a reserved word in standard SQL. It's also misleading for data that's not actually a timestamp.