Lucene Sample Query

8.1k Views Asked by At

When I search by phrase "ph1 ph2" it finds texts that contains "ph1" or "ph2".

String line = "ph1 ph2";           
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
Query query = parser.parse(line);  

Anybody knows how to search by 1) phrase ("ph1 ph2"). Example: This is sentence ph1 ph2. 2) phrase with maximum distance("ph1 ph2 ~3"). Example This ph1 is sentence ph2.

P.S I used standard Lucene Indexer to index my files. If this example is not clear view http://www.lucenetutorial.com/lucene-query-syntax.html

Here's full code:

String index = "C:/programs/lucenedemo/index";
    String field = "contents";                    
    IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(index)));
    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
    //QueryParser parser = new QueryParser(Version.LUCENE_40, field, analyzer);          
    String line = "ph1 ph2";           
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
    Query query = parser.parse(line);                     
    //doPagingSearch(searcher, query, hitsPerPage, raw, queries == null && queryString == null);         
    //doPagingSearch

    TopDocs results = searcher.search(query, 300000);
    ScoreDoc[] hits = results.scoreDocs;        
    System.out.println(results.totalHits);

    for (int i=0;i<10;i++) {    
    Document doc = searcher.doc(hits[i].doc);
        String path = doc.get("path");
        if (path != null) System.out.println((i+1) + ". " + path);                          
    } 

    //end of doPagingSearch
    reader.close();
2

There are 2 best solutions below

1
On

I'm not clear on exactly what you are looking for, but I believe it's one of:

  • "field:\"" + line + "\"" : Simple phrase query. Find the two adjacent ordered terms

  • "field:\"" + line + "\"~3" : Phrase query with slop. In order, but with up to three terms worth of separation in the two terms.

  • "field:(" + line + ")" : Not a phrase query at all. Simple search for the two terms. Any order or distance is acceptable.

You can see further options on query parser syntax in Lucene's query syntax documentation

0
On

You may want to use a SpanQuery.

Specifically, you can create a SpanNearQuey, passing the constructor an array of SpanTermQuerys, one for each clause in the phrase, and an int representing the "slop", or maximum distance (as well as a boolean indicating whether the terms must be in order).

To search, use the getSpans method on the query that you have created.

Note that this will give you a list of all such occurrences, and not a list of matching documents. Depending on how you would like to present the results, you may need to iterate over the spans and group them according to document, etc.