Elasticsearch - Sort results of Terms aggregation by key string length

Question

Elasticsearch - Sort results of Terms aggregation by key string length

1.3k Views Asked by AbrahamCoding At 13 July 2021 at 06:52

I am querying ES with a Terms aggregation to find the first N unique values of a string field foo where the field contains a substring bar, and the document matches some other constraints.

Currently I am able to sort the results by the key string alphabetically:

{
  "query": {other constraints},
  "aggs": {
    "my_values": {
      "terms": {
        "field": "foo.raw",
        "include": ".*bar.*",
        "order": {"_key": "asc"},
        "size": N
      }
    }
  }
}

This gives results like

{
  ...
  "aggregations": {
    "my_values": {
      "doc_count_error_upper_bound": 0,   
      "sum_other_doc_count": 145,           
      "buckets": [                        
        {
          "key": "aa_bar_aa",
          "doc_count": 1
        },
        {
          "key": "iii_bar_iii",
          "doc_count": 1
        },
        {
          "key": "z_bar_z",
          "doc_count": 1
       }
      ]
    }
  }
}

How can I change the order option so that the buckets are sorted by the length of the strings in the foo key field, so that the results are like

{
  ...
  "aggregations": {
    "my_values": {
      "doc_count_error_upper_bound": 0,   
      "sum_other_doc_count": 145,           
      "buckets": [                        
        {
          "key": "z_bar_z",
          "doc_count": 1
        },
        {
          "key": "aa_bar_aa",
          "doc_count": 1
        },
        {
          "key": "iii_bar_iii",
          "doc_count": 1
        }
      ]
    }
  }
}

This is desired because a shorter string is closer to the search substring so is considered a 'better' match so should appear earlier in the results than a longer string. Any alternative way to sort the buckets by how similar they are to the original substring would also be helpful.

I need the sorting to occur in ES so that I only have to load the top N results from ES.

Original Q&A

There are 1 best solutions below

**AbrahamCoding** · Accepted Answer · 2021-07-20T01:07:47.873000

I worked out a way to do this. I used a sub-aggregation per dynamic bucket to calculate the length of the key string as another field. Then I was able to sort by this new length field first, then by the actual key so keys of the same length are sorted alphabetically.

{
  "query": {other constraints},
  "aggs": {
    "my_values": {
      "terms": {
        "field": "foo.raw",
        "include": ".*bar.*",
        "order": [
          {"key_length": "asc"},
          {"_key": "asc"}
        ],
        "size": N
      },
      "aggs": {
        "key_length": {
          "max": {"script": "doc['foo.raw'].value.length()" }
        }
      }
    }
  }
}

This gave me results like

{
  ...
  "aggregations": {
    "my_values": {
      "doc_count_error_upper_bound": 0,   
      "sum_other_doc_count": 145,           
      "buckets": [                        
        {
          "key": "z_bar_z",
          "doc_count": 1
        },
        {
          "key": "aa_bar_aa",
          "doc_count": 1
        },
        {
          "key": "dd_bar_dd",
          "doc_count": 1
        },
        {
          "key": "bbb_bar_bbb",
          "doc_count": 1
        }
      ]
    }
  }
}

which is what I wanted.

Elasticsearch - Sort results of Terms aggregation by key string length

There are 1 best solutions below

Related Questions in SORTING

Related Questions in ELASTICSEARCH

Related Questions in ELASTICSEARCH-AGGREGATION

Related Questions in ELASTICSEARCH-6

Trending Questions

Popular # Hahtags

Popular Questions