I am trying to get the total number of tokens in documents that match a query. I haven't defined any custom mapping and the field for which I want to get the token count is of type 'string'.
I tried the following query, but it gives a very large number in the order of 10^20, which is not the correct answer for my dataset.
curl -XPOST 'localhost:9200/nodename/comment/_search?pretty' -d '
{
"query": {
"match_all": {}
},
"aggs": {
"tk_count": {
"sum": {
"script": "_index[\"body\"].sumttf()"
}
}
},
"size": 0
}
Any idea how to get the correct count of all tokens? ( I do not need counts for each term, but the total count).
Seems like you want to retrieve
cardinality
of total tokens in body field.In such case you can just use
cardinality aggregation
like below.For detailed information, see this official document