How to count most occurring word from a set of documents then perform sub aggregations

64 Views Asked by Suomynona At 01 July 2025 at 18:15

From an Elasticsearch query, I am able to produce let's say around 5000 documents Now, I'm trying to determine which non-stop words (stop words are auxiliary verbs / non-significant words) are appearing the most.

So I tried this query using the significant_text aggregation

$params2 = [
    'index' => ["web", "print"],
    'type'  => 'index',
    'from'  => 0,
    'size'  => 10000,
    'filter_path' => ['aggregations'],
    'body'  => [
        "query" => //omitted query here
        'aggs' => [
            'SIGNIFICANT' => [
                "significant_text" => [
                    "field" => "content"
                ]
            ]
        ]
    ]
];

Unfortunately, it still displays some garbage words that are not significant to me

My Questions:
1. Is there an alternative for significant_text aggregation?

I also want to perform a terms sub-aggregation after this significant_text main aggs, because I want to combine a query to know the popular words, and then filter the documents according to the other fields

Would greatly appreciate it if you have an idea how to perform this desired process and output

Original Q&A

How to count most occurring word from a set of documents then perform sub aggregations

There are 0 best solutions below

Related Questions in ELASTICSEARCH

Related Questions in ELASTICSEARCH-PHP

Trending Questions

Popular # Hahtags

Popular Questions