Due to not exactly best design that cannot be changed now, I have a bit of a difficult situation. I am logging some resource consumption, think RAM for example, per process/computer in an interval. The interval is not 100% precise however.
The table of event logs may look like this, here I am using hour fractions from some arbitrary moment as an example, not UTC timestamp, which is actually used.
| timestamp | ID of source | ID of system | resource usage |
|---|---|---|---|
| 0 | server 1 | mysystem | 100 |
| 0.01 | server 2 | mysystem | 50 |
| 0.02 | server 3 | mysystem | 10 |
| 7.99 | server 1 | mysystem | 40 |
| 8.01 | server 2 | mysystem | 30 |
| 8.01 | server 3 | mysystem | 20 |
My task is to provide usage for each system, which can have multiple of these servers that independently log their usage. The naive idea I originally implemented was to divide my timestamp by the logging interval and then sum records that have same timestamp and ID of system. The table above illustrates perfectly why that won't work. If I am logging each 8 hours, but there is slight offset, wrong values will be grouped.
You could think round() instead of floor() division would help - but different systems have started logging at different times, so there is also constant offset from the 8 hour interval.
What do you propose to group those records, so that I have accurate grouping?