How to count most occurring word from a set of documents then perform sub aggregations

58 Views Asked by At

From an Elasticsearch query, I am able to produce let's say around 5000 documents Now, I'm trying to determine which non-stop words (stop words are auxiliary verbs / non-significant words) are appearing the most.

So I tried this query using the significant_text aggregation

$params2 = [
    'index' => ["web", "print"],
    'type'  => 'index',
    'from'  => 0,
    'size'  => 10000,
    'filter_path' => ['aggregations'],
    'body'  => [
        "query" => //omitted query here
        'aggs' => [
            'SIGNIFICANT' => [
                "significant_text" => [
                    "field" => "content"
                ]
            ]
        ]
    ]
];

Unfortunately, it still displays some garbage words that are not significant to me

enter image description here

My Questions:
1. Is there an alternative for significant_text aggregation?

  1. I also want to perform a terms sub-aggregation after this significant_text main aggs, because I want to combine a query to know the popular words, and then filter the documents according to the other fields

Would greatly appreciate it if you have an idea how to perform this desired process and output

0

There are 0 best solutions below