Is Marklogic search with position possible?

97 Views Asked by At

There is some explanation of a use case below; the actual question follows.

I am using ML search queries on some documents that contain elements of the form:

<resource>
  <version>
   <metadata label="author">Jim</metadata>
   ...
  </version>
  <version>
   <metadata label="author">John</metadata>
   ...
  </version>
</resource>

Note the versioning of metadata. Uppermost version element contains up-to-date info for the document.

The queries are based on user input; the user looks e. g. for documents, whose author is John.

I am not knowledgeable enough to combine attribute value and element/text value queries in a better way than this:

cts:near-query((cts:element-attribute-value-query(xs:QName("metadata"), xs:QName("label"), "author"), cts:element-value-query(xs:QName("metadata"), "John")), 0)

It does work though, so I am fine with it. What doesn't work is choosing only the last version in the resource (/resource/version[1]). If, at a certain point, the "author" was changed from "John" to "Jim", the document with the resource as shown above will always be found, because I don't know how to look only for values in the last (uppermost) version element. So I have to filter the results once more over XPath in a loop.

Is there a way to do this on an ML search query level?

2

There are 2 best solutions below

1
On

You could create a field with a path that points to the metadata with the @label="author" that is in the first version element: /resource/version[1]/metadata[@label="author"] and then you could use a cts:field-value-query()

Then you could search that named field:

cts:search(doc(), cts:field-value-query("author", "John"))
1
On

Just through xPath, someone (yourself or MarkLogic) will have to take the hit of filtering on the value you want. This is even the case for using a searchable expressions and using the filtering option. Such is the case of repeating elements in a document.

The most efficient way is to index the path in question separately and then query against that value.

Some options:

  • TDE Template to extract the value. Even though extremely powerful and likely my choice, it steps away from your simple example, so I will pass on that example.
  • Range Index. Nice, but memory mapped assuming that you want to do range queries - but your query is on a simple value query, so we will skip this
  • Field. Simple, elegant and what is needed here. Define the xPath to exactly what you want and a second-pass indexing will pull that value out and index it separately with its own indexing rules. you can then query this value.

Please note the semi-colon (;) -there are three separate executions here. (1) field creation, (2) document insert and (3) search.

   xquery version "1.0-ml";
   import module namespace admin = "http://marklogic.com/xdmp/admin"
          at "/MarkLogic/admin.xqy";

   let $config := admin:get-configuration()
   let $dbid := xdmp:database("Documents")
   let $field-name := "latest-resource"
   
   return
   if(empty(admin:database-get-fields($config, $dbid)[./*:field-name="latest-resource"]))
    then 
      let $field-spec := <field xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://marklogic.com/xdmp/database">
        <field-name>{$field-name}</field-name>
        <field-path><path>/resource/version[./metadata/@label="author"][1]</path>
          <weight>1.0</weight>
        </field-path>
    <word-searches>false</word-searches>
    <field-value-searches>true</field-value-searches>
      </field>
      let $_ := admin:save-configuration(admin:database-add-field($config,$dbid,$field-spec))
      return ()
    else ();
    
    (:--------------------------------------------:)   
   (
     xdmp:document-insert("/sample/jim-first.xml", <resource>
       <version>
         <metadata label="author">Jim</metadata>
       </version>
       <version>
         <metadata label="author">John</metadata>
       </version>
     </resource>),
     xdmp:document-insert("/sample/john-first.xml", <resource>
       <version>
         <metadata label="author">John</metadata>
       </version>
       <version>
         <metadata label="author">Jim</metadata>
       </version>
     </resource>)
   );
         
   (: ----------------------------------------------------- :)
   cts:search(doc(), cts:field-value-query("latest-resource", "Jim"))

In this case, only Jim is returned where he is in the first version.

<resource>
  <version>
    <metadata label="author">Jim</metadata>
  </version>
  <version>
   <metadata label="author">John</metadata>
  </version>
</resource>