How to divide the returned documents into a fix number of buckets in Elasticsearch

185 Views Asked by At

I have a question that I thought was simple to google but I couldn't find an answer.

I have an index of articles with an optional field price among the many.

I'd like to create an aggregation that divides all the articles matching the query in buckets that match these properties:

  • a bucket for all items with price = 0
  • a fixed number of buckets for all the items with price > 0 (say 10), possibly with a range included.

To give an idea of the ideal result, given a search returning items with prices ranging from 0 to 2000, the result of the aggregation could be something like:

[
   {"from": 0, "to": 0, "doc_count": 100},
   {"from": 1, "to": 200, "doc_count": 23},
   {"from": 201, "to": 400, "doc_count": 11},
   {"from": 401, "to": 600, "doc_count": 23},
   {"from": 601, "to": 800, "doc_count": 120},
   {"from": 801, "to": 1000, "doc_count": 1100},
   {"from": 1001, "to": 1200, "doc_count": 25},
   {"from": 1201, "to": 1400, "doc_count": 12},
   {"from": 1401, "to": 1600, "doc_count": 24},
   {"from": 1601, "to": 1800, "doc_count": 42},
   {"from": 1801, "to": 2000, "doc_count": 52}
]

I have looked into histogram aggregation but it focuses on fixed intervals rather than fixed amount of buckets and I couldn't find a way to isolate the items with price = 0.

{
  "aggs": {
    "price": {
      "histogram": {
        "field": "price",
        "interval": 2500.0
      }
    }
  },
  "query": {
    "match_all": {}
  },
  "size": 0
}

With some backend processing I was able to obtain the following result

[
  {
    "Key": {
      "From": 0,
      "To": 2499
    },
    "Items": 139
  },
  {
    "Key": {
      "From": 2500,
      "To": 4999
    },
    "Items": 17
  },
  {
    "Key": {
      "From": 5000,
      "To": 7499
    },
    "Items": 7
  },
  {
    "Key": {
      "From": 7500,
      "To": 9999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 10000,
      "To": 12499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 12500,
      "To": 14999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 15000,
      "To": 17499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 17500,
      "To": 19999
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 20000,
      "To": 22499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 22500,
      "To": 24999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 25000,
      "To": 27499
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 27500,
      "To": 29999
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 30000,
      "To": 32499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 32500,
      "To": 34999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 35000,
      "To": 37499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 37500,
      "To": 39999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 40000,
      "To": 42499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 42500,
      "To": 44999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 45000,
      "To": 47499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 47500,
      "To": 49999
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 50000,
      "To": 52499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 52500,
      "To": 54999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 55000,
      "To": 57499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 57500,
      "To": 59999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 60000,
      "To": 62499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 62500,
      "To": 64999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 65000,
      "To": 67499
    },
    "Items": 1
  },
  {
    "Key": {
      "From": 67500,
      "To": 69999
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 70000,
      "To": 72499
    },
    "Items": 0
  },
  {
    "Key": {
      "From": 72500,
      "To": 74999
    },
    "Items": 2
  },
  {
    "Key": {
      "From": 75000,
      "To": null
    },
    "Items": 1
  }
]

I also have discarded using the range aggregation as it requires the definition of well-defined buckets by their boundaries.

I hope I gave enough context about what I am trying to achieve and what I attempted.

Thanks.

EDIT: added paragraph about range aggregation.

1

There are 1 best solutions below

1
On

Range aggregation will allow you to create price ranges and so you will have a controlled limit of buckets you want

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html

{
  "aggs": {
    "price": {
      "range": {
        "field": "price",
        "ranges": [
          { "from": 0.0, "to": 1.0 },
          { "from": 1.0, "to": 10.0 },
          { "from": 10.0, "to": 20.0 },
          { "from": 20.0 }
        ]
      }
    }
  }, 
  "size": 0
}