Get document on some condition in elastic search java API

1.4k Views Asked by At

As I know we can parse document in elastic search, And when we search for a keyword, It will return the document using this code of java API:-

  org.elasticsearch.action.search.SearchResponse searchHits =  node.client()
            .prepareSearch()
            .setIndices("indices")
            .setQuery(qb)
            .setFrom(0).setSize(1000)
            .addHighlightedField("file.filename")
            .addHighlightedField("content")
            .addHighlightedField("meta.title")
            .setHighlighterPreTags("<span class='badge badge-info'>")
            .setHighlighterPostTags("</span>")
            .addFields("*", "_source")
            .execute().actionGet();

Now my question is, suppose some documents have string like these:-

Jun 2010 to Sep 2011                First Document          
Jun 2009 to Aug 2011                Second Document             
Nov 2011 – Sep 2012                 Third Document   
Nov  2012- Sep 2013                 Forth Document   
Nov 2013 – Current                  First Document   
June 2014 – Feb 2015                Third Document   
Jan 2013 – Jan 2014                 Second Document   
July 2008 – Oct 2012                First Document   
May 2007 – Current                  Forth Document   

Now i want those documents who comes between these conditions:-

1 to 12 months
13-24 months
26-48 months

How i can do this?

1

There are 1 best solutions below

3
On BEST ANSWER

When indexing documents in this form, Elasticsearch will not be able to parse those strings as dates correctly. In case you transformed those strings to correctly formatted timestamps, the only way you could perform the query you propose is to index those documents in this format

{
  "start": "2010-09",
  "end": "2011-10",
  // rest of the document
}

and subsequently run a script-filtered query over them, compiling a script that calculates the difference between those two dates with one of the scripting languages Elasticsearch provides. Bear in mind that script filtering and scoring is always much slower than a simple index lookup.

A much faster and cleaner way to do this is to index the duration of the period alongside the start and end dates, like so

{
  "start": "2010-09",
  "end": "2011-10",
  "duration": 13
  // the rest of the document
}

If you index your documents in this form, you can simply perform a filtered query on the duration field:

{
   "query":{
      "filtered":{
         "filter":{
            "and":[
               {
                  "range":{
                     "duration":{
                        "gte":1
                     }
                  }
               },
               {
                  "range":{
                     "duration":{
                        "lte":12
                     }
                  }
               }
            ]
         }
      }
   }
}