Lucene searching using payload and NLP tags

Question

Lucene searching using payload and NLP tags

345 Views Asked by igopimac13 At 05 June 2025 at 16:04

I have already indexed the documents with each word having payload that contains the Part of speech (POS) tag. I want to search only those documents for which the search query words have that POS tag. E.g. 'access google' has google as Noun. It should show only docs with google as noun. Can writing a custom analyser help? How can i access the Term when Payload is being accessed in Similarity class?

Original Q&A

There are 3 best solutions below

**Mark Giaconia** · Answer 1

Mark Giaconia On 11 December 2013 at 13:52

doing exact (:google AND :'noun') queries in lucene can be tricky... what is your query and how are you writing the docs to the index?

**fatih** · Answer 2

fatih On 07 January 2014 at 13:23

I would recommend using span queries. Span queries can return a Spans object which allow to inspect the payload of every matching token.

See PayloadTermQuery.

**Debasis** · Answer 3

You can use the PayloadAttribute class to store the tags as payloads and then override the scorePayload method of DefaultSimilarity class to make use of the tags. In your case you would want to return 1 if the tag content is noun and zero otherwise.

The following code snippet is useful to set the payload information

    String tag = "noun";
    byte[] payload = tag.getBytes(); 
    Payload payloadData = new Payload(payload);
    payloadAttr.setPayload(payloadData);

Now use the following lines of code to make use of the tags during retrieval. This has to done by extending the DefaultSimilarity class.

    class PayloadSimilarity extends DefaultSimilarity {
    ...
    ...
    protected float scorePayload(int doc, int start, int end, BytesRef payload) {
        String payloadData = payload.utf8ToString();
        return payloadData.equals("noun")? 1 : 0;
    }
    ...
    ...
    }

Finally just set your similarity class to your extended class during retrieval.

    searcher.setSimilarity(new PayloadSimilarity());

Lucene searching using payload and NLP tags

There are 3 best solutions below

Related Questions in LUCENE

Related Questions in NLP

Related Questions in OPENNLP

Trending Questions

Popular # Hahtags

Popular Questions