I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete. To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.
Let's say you have a gigantic table
measure
with two columnsdatestamp
andtemp
.Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining
trunc
in a moment.That works for any reasonable definition of
trunc
. In this casetrunc(t)
returns the beginning of the six-minute period in whicht
occurs. So,trunc('1942-12-07 08:45:17')
gives1942-12-07 08:42:00
).Here's a query that works for every six minute interval.
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
Then your query will say
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of
6
.You'll need different
trunc()
functions for hours, etc.The
trunc()
function for daily summaries isDATE(datestamp)
. For monthly summaries it isLAST_DAY(datestamp)
. For example,yields a month-by-month summary.