I need to extend REST API in java with Spring reaching to Marklogic database. I already have functionality using StructuredQueryBuilder and the search method from DocumentManagerImpl (package com.marklogic.client.impl), but the client expects highlighting fragments of answers matching the searched phrases in Polish language, including derivatives from the stems (there may be several keywords by which we search, but with the condition of joint occurrence in the result).
- How to extend the search query to Marklogic in the simplest way and using the Java API from Marklogic to obtain additional information about the location of the searched phrases in the returned objects in one query to the database?
- Should I put a custom dictionary for stemming in Marklogic? Are there any sources recommended by Marklogic where I can get dictionaries?
You can get snippets with highlighting via the Java API via code like this:
Custom dictionaries are covered at https://docs.marklogic.com/guide/search-dev/custom-dictionaries. I believe that once you've created a dictionary, you'll want to modify the Language setting on your database to use the new dictionary (I have not tried that before, but that appears to be the expected approach).
As for a Polish dictionary - there's a link to a repository of dictionaries at https://developer.marklogic.com/code/dictionaries-and-thesauri/, but there's not a Polish dictionary there. Building a complete dictionary would of course be a significant effort, though it sounds like if you're mostly interested in stemming on certain keywords, you could build a custom dictionary containing just those keywords and their stems.