From an Elasticsearch query, I am able to produce let's say around 5000 documents Now, I'm trying to determine which non-stop words (stop words are auxiliary verbs / non-significant words) are appearing the most.
So I tried this query using the significant_text aggregation
$params2 = [
'index' => ["web", "print"],
'type' => 'index',
'from' => 0,
'size' => 10000,
'filter_path' => ['aggregations'],
'body' => [
"query" => //omitted query here
'aggs' => [
'SIGNIFICANT' => [
"significant_text" => [
"field" => "content"
]
]
]
]
];
Unfortunately, it still displays some garbage words
that are not significant to me
My Questions:
1. Is there an alternative for significant_text
aggregation?
- I also want to perform a
terms
sub-aggregation after thissignificant_text
main aggs, because I want to combine a query to know the popular words, and then filter the documents according to the other fields
Would greatly appreciate it if you have an idea how to perform this desired process and output