I am querying ES with a Terms aggregation to find the first N unique values of a string field foo where the field contains a substring bar, and the document matches some other constraints.
Currently I am able to sort the results by the key string alphabetically:
{
"query": {other constraints},
"aggs": {
"my_values": {
"terms": {
"field": "foo.raw",
"include": ".*bar.*",
"order": {"_key": "asc"},
"size": N
}
}
}
}
This gives results like
{
...
"aggregations": {
"my_values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 145,
"buckets": [
{
"key": "aa_bar_aa",
"doc_count": 1
},
{
"key": "iii_bar_iii",
"doc_count": 1
},
{
"key": "z_bar_z",
"doc_count": 1
}
]
}
}
}
How can I change the order option so that the buckets are sorted by the length of the strings in the foo key field, so that the results are like
{
...
"aggregations": {
"my_values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 145,
"buckets": [
{
"key": "z_bar_z",
"doc_count": 1
},
{
"key": "aa_bar_aa",
"doc_count": 1
},
{
"key": "iii_bar_iii",
"doc_count": 1
}
]
}
}
}
This is desired because a shorter string is closer to the search substring so is considered a 'better' match so should appear earlier in the results than a longer string. Any alternative way to sort the buckets by how similar they are to the original substring would also be helpful.
I need the sorting to occur in ES so that I only have to load the top N results from ES.
I worked out a way to do this. I used a sub-aggregation per dynamic bucket to calculate the length of the key string as another field. Then I was able to sort by this new length field first, then by the actual key so keys of the same length are sorted alphabetically.
This gave me results like
which is what I wanted.