I have following timeseries data table,
| Id | Object_Id | Datetime | Sensor1 | Sensor2 |
|---|---|---|---|---|
| 1 | 175 | '2022-03-24 22:01:00' | 0 | 0 |
| 2 | 175 | '2022-03-24 22:02:00' | 0 | 0 |
| 3 | 175 | '2022-03-24 22:03:00' | 5.5699381666667 | 81.342836833333 |
| 4 | 175 | '2022-03-24 22:04:00' | 5.5668366666667 | 81.281143 |
| ... | ... | ... | ... | ... |
| 48 | 175 | '2022-03-24 22:48:00' | NULL | NULL |
| ... | ... | ... | ... | ... |
in above table, I have sensor1, sensor2 upto sensor1000 columns and each has millions of time series records.
Scenario1:
I want to retrieve starttime(min datetime) and endtime(max datetime) records for multiple sensors eg. sensor1 & sensor2.
I have sensors array like below,
sensors = ["Sensor1", "Sensor2", ..., "sensorN"];
I loop through above array and construct below queries for Sensors eg for Sensor1,
(SELECT datetime,id,sensor1
FROM timeseries
where datetime between '2021-01-20 22:01:00' and '2024-01-20 00:17:00' limit 1)
Union ALL
(SELECT datetime,id,sensor1
FROM timeseries
where datetime between '2021-01-20 22:01:00' and '2024-01-20 00:17:00' ORDER BY datetime DESC limit 1)
for Sensor2,
(SELECT datetime,id,sensor2
FROM timeseries
where datetime between '2021-01-20 22:01:00' and '2024-01-20 00:17:00' limit 1)
Union ALL
(SELECT datetime,id,sensor2
FROM timeseries
where datetime between '2021-01-20 22:01:00' and '2024-01-20 00:17:00' ORDER BY datetime DESC limit 1)
This query works as expected. Observe that I created individual query for sensor1 and sensor2 as I want to handle failure scenario for individual sensor.
The problem is in real life scenario, I have millions of records, around 50 to 70 GB data.
When I run above query (for just sensor1 & sensor2), it takes significant amount of time to return the result. What if user is trying with multiple sensors (for functional requirement I can select up to 35 sensors in one go). Trust me it takes time and we can observe it in real implementation.
In this demo it works smoothly as It has only 150 records.
How can I improve/optimize the performance of this query?
FYI, for each requested sensor, query is prepared in Node/express app and then app connects to MySQL server where I execute each query using for loop:
const queryResults = await Promise.all(
queries.map(async (query) => {
return new Promise((resolve, reject) =>
db.query(query, [], (err, result) => {
if (err) {
return resolve([]); // If individual query fails then return empty array
} else {
return resolve(result);
}
})
);
})
);