Any efficient way to get unique terms from Elasticsearch index

662 Views Asked by Animesh Pandey At 17 August 2025 at 23:30

I aim is to store all unique term along with their md5 hashes in a database. I have a 1 million document index which has ~400000 unique terms. I got this figure from using aggregations in elasticsearch.

GET /dt_index/document/_search
{
  "aggregations": {
    "my_agg": {
      "cardinality": {
        "field": "text"
      }
    }
  }
}

I can get the unique terms using the following:

GET /dt_matrix/document/_search
{
  "aggregations": {
    "my_agg": {
      "term": {
        "field": "text",
        "size": 100
      }
    }
  }
}

This gives me 10 search results along with the term aggregation of 100 unique terms. But getting a JSON of ~400000 terms would require memory. Just like for parsing through all the search results we can use scan-scroll. Is there any way I can parse through all unique terms without loading all in memory?

Original Q&A

There are 2 best solutions below

Vineeth Mohan On 21 June 2015 at 06:17

You cant scan scroll through aggregation results. Rather , you should index these unique terms in a separate index or type while indexing and then do a normal pagination over it.

headspiderthingy On 21 June 2015 at 07:01

Although you can't scroll through aggregations, you can retrieve smaller, more memory manageable subsets by adding to your query request. For example, you can request all unique terms starting with the letter A, and so on. Adjust your query until you are satisfied with the size of the biggest subset.

Any efficient way to get unique terms from Elasticsearch index

There are 2 best solutions below

Related Questions in ELASTICSEARCH

Related Questions in PYELASTICSEARCH

Related Questions in ELASTICSEARCH-MARVEL

Trending Questions

Popular # Hahtags

Popular Questions