I am new to Elasticsearch. I tried to get result from ES using CData Elasticsearch ODBC driver. Is it possible to get sum of score field?
My code:
OdbcConnection connection = new OdbcConnection("Driver={CData ODBC Driver for Elasticsearch};server=localhost");
connection.Open();
string query = "select sum(_score) from ordersdetails";
OdbcCommand odbcCommand = new OdbcCommand(query, connection);
OdbcDataReader dataReader = odbcCommand.ExecuteReader();
DataTable dataTable = new DataTable();
dataTable.Load(dataReader);
connection.Close();
I have faced the below exception
System.Data.Odbc.OdbcException: 'ERROR [HY000] The '_score' column is not applicable to the sum function.'
But the below query returns result:
"select _id, sum(_score) from ordersdetails group by _id"
Anybody know, Why I got exception when tried to get a result for a single column?
If you know the solution, please share with me.
After doing a couple of experiments with
pyodbc
and ElasticSearch I come to these conclusions:_score
and does not let the user to do so_score
is most likely a bug and is performed not by ElasticSearch but by the driver.In short, don't use
_score
for anyGROUP BY
, it's a special feature of ElasticSearch dedicated to relevance sorting.A bit of introduction
As I already mentioned in the comments to the question,
_score
in ElasticSearch is a measure of how document is relevant to a given query (see docs):This field is not a part of the document and is computed for every query and every document. In ElasticSearch
_score
is used for sorting. However,_score
is not always computed, for instance when sorting on an existing field is required:Since this field is computed on-the-fly, it is not possible to create efficient aggregation, hence ElasticSearch does not allow this directly. However, this still can be achieved by using scripts in the aggregations.
CData ODBC Driver is aware of _score field
CData ODBC Driver is aware of
_score
field:Basically, this means that by explicitly mentioning
_score
in your query will make ODBC return such field (which might to be there by default).The experiments
I installed pyodbc and set up ElasticSearch 5.4 at my localhost. I tuned ES to log all the queries it receives.
1.
At first I reproduced the first case:
And received this exception:
In the log of ES I catched this query:
2.
Next I took the second query:
Which executed without exceptions, but resulted in this ES query:
3.
Then I tried to mock the library with non existing fields:
In this case exception was different:
Although the query sent to ES was the same as in the first case.
4.
Then I tried to find out how does the library send aggregate requests:
In fact, it did use ES aggregations:
Conclusions
The fact that the library is able to recognize
_score
as a special keyword, and also because it did not attempt to generate ES aggregations when asked forsum(_score)
, I assume that it does not allow in general to do aggregations on_score
and the "working" case here is likely a bug.