I have a table where I record events. These events will come into the system once a minute, per user - on the order of 10's of thousands a minute. But I don't need to save all of them.
If an event comes in within 90 seconds of the previous event, I want to update the previous row. If it's been more than 90 seconds, I want to insert a new row.
Example table:
create table events (
id serial,
user_id int references users(id) not null,
created_at timestamp not null default now(),
updated_at timestamp not null default now(),
...some other event columns here...
);
create index idx_events_user_id_updated_at on events (user_id, updated_at desc);
And the pseudocode would be something like:
- Fetch last event for user
- If
updated_at
is within the last 90 seconds:- Update row, including
updated_at
withnow()
- Update row, including
- Otherwise:
- Insert new row
Is there a way to do this with a single Postgres statement?
I'm aware of on conflict
, but I don't think it'll work for this use-case. The (user_id, updated_at)
pair could be defined as a unique constraint, which could be used to trigger an on conflict
, but the timestamps are arbitrary. These events come in "every minute", but not exactly on a minute (or even exactly a minute apart, due to network latency, server latency, etc etc, hence using 90 seconds to give a 30 second buffer). Truncating the timestamps to the minute would reduce the feature's usefulness, so I'd hate to do that just to handle upserts more cleanly.
Can be done with an UPSERT command. You need an
EXCLUSION
constraint on the timestamp range. For the leading integer columnuser_id
, the additional modulebtree_gist
must be installed. See:Notably, the same doesn't work for
timestamptz
becausetimestamptz + interval
is onlySTABLE
, whiletimestamp + interval
isIMMUTABLE
as required for the implied GiST index (or any index for that matter).Query:
fiddle
This assumes at most one row per
user_id
in the input.And no concurrent, competing writes. Else, there may be race conditions due to the tiny, but unavoidable time gap between
INSERT
andUPDATE
. (ON CONFLICT ... DO UPDATE
is not allowed based on an exclusion constraint.)Also, no ambiguous data types in the input, or you must add an explicit cast. See:
Depending on details of your case, there may well be faster solutions. Like, just write all incoming rows (with a substantially faster
COPY
command), and later consolidate in bulk operations.While sticking with this solution, it may be faster to add a generated column for the
tsrange
and base the exclusion constraint (and its implied GiST index) on that. Bloats the table row a bit, though. See: