Query the most recent timestamp (MAX/Last) for a specific key, in Influx

32.5k Views Asked by At

Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.

The setup is simple, where I have three measurements: location, network and usage.
There is only one key: device_id.

In pseudo-code, this would be something like:

# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';

The question: What would be the most efficient way of querying this?

The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.

I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.

3

There are 3 best solutions below

2
On

Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.

The query that you'll want is

SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'

You'll need to run this query for each measurement.

SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'

From there you'll get the one with the greatest time stamp

> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value),     time from usage where device_id = 'x';
name: location
time                last
----                ----
1483640697584904775 3

name: network
time                last
----                ----
1483640714335794796 4

name: usage
time                last
----                ----
1483640783941353064 4
2
On

I tried this and it worked for me in a single command :

SELECT last(<field name>), time FROM location, network, usage WHERE device_id = 'x'

The result I got :

name: location
time                last
----                ----
1483640697584904775 3

name: network
time                last
----                ----
1483640714335794796 4

name: usage
time                last
----                ----
1483640783941353064 4
3
On

tl;dr; The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries

First:

SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1

Last:

SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1

Explanation:

If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:

time                   fieldKey_1     fieldKey_2     device
------------------------------------------------------------
2019-09-16T00:00:01Z   NULL           A              1
2019-09-16T00:00:02Z   X              B              1
2019-09-16T00:00:03Z   Y              C              2
2019-09-16T00:00:04Z   Z              NULL           2

In this case querying

SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1" 

will return

time                   fieldKey_1
---------------------------------
2019-09-16T00:00:02Z   X         

and the same query for first(fieldKey_2) will return a different time

time                   fieldKey_2
---------------------------------
2019-09-16T00:00:01Z   A

A similar problem will happen when querying with last.

And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:

 time                   first_fieldKey_1    first_fieldKey_2
 -------------------------------------------------------------
 1970-01-01T00:00:00Z   X                   A

So, the solution would be querying using combinations of LIMIT and ORDER BY. For instance, for the first time value you can use:

SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1

and for the last one you can use

SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1

It is safe and fast as it will relay on indexes.

Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.

Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.