Elasticsearch filter the maximum value document

Question

Elasticsearch filter the maximum value document

3.8k Views Asked by Muhamamd Awais At 07 June 2025 at 17:51

I trying to get the maximum value of document from the same name records. Forexample, I have 3 users, 2 of them have same name but different followers count, I wanted to return only 1 document from the 2 same with same name based on the maximum of followers_count.

{ id: 1, name: "John Greenwood", follower_count: 100 }
{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }

So the result would be,

{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }

From 2 same names, the one with the maximum followers wins and other single one will also come.

I have mapping as follow,

"users-development" : {
    "mappings" : {
      "user" : {
        "dynamic" : "false",
        "properties" : {
          "follower_count" : {
            "type" : "integer"
          },
          "name" : {
            "type" : "string",
            "fields" : {
              "exact" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
            }
          },
        }
      }
    }

This is where I have been stucked from long,

         {
            query: {
              filtered: {
                filter: {
                  bool: {
                    must: [
                      { terms: { "name.exact": [ "John Greenwood", "John Underwood" ] } },
                    ]
                  }
                }
              }
            },

            aggs: {
              max_follower_count: { max: { field: 'follower_count' } }
            },

            size: 1000,
          }

Any suggestions please

Original Q&A

There are 2 best solutions below

**Waldemar Neto** · Answer 1

Your question have a special tool in the elastic stack as a hammer for a head kkk. Are Aggregations, See the examples: First of all in your case you will need aggregate by full name including spaces, your name field need to be not_analyzed like this

`PUT /index
{
  "mappings": {
    "users" : {
      "properties" : {
        "name" : {
          "type" :    "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}`

Now your query will be like this one:

`POST /index/users/_search
{
   "aggs": {
      "users": {
         "terms": {
            "field": "name"
         },
         "aggs": {
            "followers": {
               "max": {
                  "field": "follower_count"
               }
            }
         }
      }
   }
}`

I just aggregated by name and used a max metric to get the higgest follower count.

The response will be like this:

`"aggregations": {
      "users": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "John Greenwood",
               "doc_count": 2,
               "followers": {
                  "value": 200
               }
            },
            {
               "key": "John Underwood",
               "doc_count": 1,
               "followers": {
                  "value": 300
               }
            }
         ]
      }
   }`

Hope that will be good for you. Use aggregations for all situations that you need aggregate data and get sum on values.

**IanGabes** · Answer 2

Ok, I think you are looking for something along these lines, using the terms aggregation

{
   "query": {
      "terms": { "name.exact": [ "John Greenwood", "John Underwood" ] }
   },
   "aggs": {
      "max_follower_count": {
         "terms": {
            "field":"name.exact"
         },
         "aggs":{
             "max_follow" : { "max" : { "field" : "follower_count" } }
         }
      }
   },
   "size": 1000
}

The terms aggregation will make a bucket for each unique value, from names.exact, which will only be those specified in your terms query. So we now have a bucket for both Johns, now we can use the max aggregation to count who has the most followers. The max aggregation will operate on each bucket in its parent aggregation.

Each of these unique terms will then have its max value of follower_count computed, and displayed in the bucket. Results look as follows:

... //query results of just the terms query up here
"aggregations": {
  "max_follower_count": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "John Greenwood",
           "doc_count": 2,
           "max_follow": {
              "value": 200
           }
        },
        {
           "key": "John Underwood",
           "doc_count": 1,
           "max_follow": {
              "value": 300
           }
        }
     ]
  }
}

The terms aggregation comes with a few caveats with how it does the counting, and the documentation linked should be pretty clear on that.

Elasticsearch filter the maximum value document

There are 2 best solutions below

Related Questions in RUBY

Related Questions in RUBY-ON-RAILS-4

Related Questions in ELASTICSEARCH

Related Questions in ELASTICSEARCH-RAILS

Trending Questions

Popular # Hahtags

Popular Questions