I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:

es.search(index="myindex", body={"query": {"match": {"text_field": "search_term"}}}, search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ

The problem is when I run it on cloud my output is totally different.

I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.

Local OS - Ubuntu 16.0.4 OS on Amazon EMR -Amazon Linux

Any help would be really appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

For those who responded to my questions, thanks for the efforts.

I figured out what the problem was.

There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.

Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.

Hope this would be helpful for those running elasticsearch on Amazon EMR.

Cheers!

2
On

Try using the "preference" parameter while querying the data. Something like this:

es.search(index="myindex",
    body={"query": {"match": {"text_field": "search_term"}}},
    preference="_primary_first"
)

Update: Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0