Using date_histogram with fixed_interval (30d) unexpected bucket start

2.3k Views Asked by At

I have a requirement to get data aggregated per 30 days (not month) so I'm using a date_histogram with "fixed_interval": "30d" to get that data. For example, if the user wants the last 90 days aggregations, there should be 3 buckets: [90-60, 60-30, 30-0]. Taking today's date (18-Mar-2021), I would want buckets [18-Dec,17-Jan,16-Feb].

However, what I actually get is [4-Dec,3-Jan,2-Feb,4-Mar]. The first bucket starts way earlier than any data is available, which also means an additional bucket than expected is needed in the end.

I found out that you can't easily tell when your buckets are meant to start (e.g. I want my first bucket to start at today-90 days). Buckets seem to start from 1970-01-01 according to what I could find (e.g. this) and the documentation kinda says this as well (this link, though it doesn't go into depth of the impact).

With this in mind, I worked out that I could use offset with an "interesting formula" so that I get the correct buckets that I need. E.g.:

GET /my_index/_search?filter_path=aggregations
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        { "range" : {
          "@timestamp" : {
              "gte" : "TODAY - 90/60/30",
              "lt" : "TODAY"
          }}
        }
      ]
    }
  },
  "aggs": {
    "discussion_interactions_chart": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "30d",
        "format": "yyyy-MM-dd",
        "offset":    "(DAYS(@timestamp.gte, 1970-01-01) % 30)d"
      }
    }
  }
}

(obviously this query doesn't work directly, I build the variables in code which for the example of 18-Mar-2021 offset is 14)

So basically offset is calculated as the number of days between my lower bound date and epoch, and then mod that value by 30. This seems to work but it's kinda hard to justify this logic on a code review. Is there a nicer solution to this?

1

There are 1 best solutions below

0
On

Here's a Python implementation of the answer in your question (which you really deserve upvotes for, it's clever and helped me):

fixed_interval_days = 90

# offset needed to make fixed_interval histogram end on today's date (it starts the intervals at 1970-01-01)
offset_days = (datetime.datetime.utcnow() - datetime.datetime(1970, 1, 1)).days % fixed_interval_days

...
    A(
        "date_histogram",
        fixed_interval=f"{fixed_interval_days}d",
        offset=f"{offset_days}d",