Clickhouse MergeTree not aggregating inserts

57 Views Asked by At

I want to create a clickhouse table that stores the max value of stats over a period of every one hour. My base table looks like below

CREATE TABLE base_table
(
    `time` DateTime,
    `stats` Float64
)
ENGINE = MergeTree
ORDER BY time
SETTINGS index_granularity = 8192

The materialized view look like below

CREATE MATERIALIZED VIEW hourly_max_stats_mv TO hourly_max_stats
(
    `hour_start` DateTime,
    `hour_end` DateTime,
    `max_stats` Float64
) AS
SELECT
    toStartOfHour(time) AS hour_start,
    addHours(toStartOfHour(time), 1) AS hour_end,
    max(stats) AS max_stats
FROM base_table
GROUP BY
    hour_start,
    hour_end

When I insert the below 2 rows in the base table, I am expecting the hourly_max_stats_mv table to contain only 1 row with the max value over an interval of 1 hour.

INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:30:00', 10);
INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:40:00', 20);

Instead I see both the records in hourly_max_stats. I checked the clickhouse logs and there are no errors. I even waited for more than an hour to let the background process aggregate the result, but I still see 2 records. Any idea what am I doing wrong?

1

There are 1 best solutions below

1
Rich Raposa On BEST ANSWER

You are using MergeTree on your destination table when you actually want to be using AggregatingMergeTree. Here is what you need (I'm a little confused as to why you need the "end" of an hour so I removed it):

CREATE TABLE hourly_max_stats (
    hour_start DateTime,
    max_stats SimpleAggregateFunction(max, Float64)
)
ENGINE = AggregatingMergeTree
PRIMARY KEY hour_start;

CREATE  MATERIALIZED VIEW hourly_max_stats_mv 
TO hourly_max_stats
AS
SELECT
    toStartOfHour(time) AS hour_start,
    maxSimpleState(stats) AS max_stats
FROM base_table
GROUP BY
    hour_start;

INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:30:00', 10);
INSERT INTO base_table (time, stats) VALUES ('2024-01-18 12:40:00', 20);


SELECT
    hour_start,
    max(max_stats)
FROM hourly_max_stats
GROUP BY hour_start;

Query id: 194b3229-f94f-4503-abc7-552a11f4fc7a

┌──────────hour_start─┬─max(max_stats)─┐
│ 2024-01-18 12:00:00 │             20 │
└─────────────────────┴────────────────┘

1 row in set. Elapsed: 0.025 sec.

I explain the details here: https://www.youtube.com/watch?v=-Ma2ohy_6lM&list=PL0Z2YDlm0b3gtIdcZI3B_8bMJclDOvY8s&index=10