Why are the results of my Optic API query not what I'm expecting when sorting and filtering?

404 Views Asked by At

What I'm trying to accomplish:

I'm currently working on a xquery script to retrieve data of a marklogic database in form of a html table with the possibility to order and filter every column. The data is distributed in multiple directories as different documents with foreign keys just like a relational database would hold the data. Now that might not be the optimal solution, but I sadly can't change that for now.

What's the problem:

The first solution was via cts:search on the main directory (smlaws) which came with the option to filter and sort for all elements and that worked quite well. But some information does not exist in those documents, so I've had to make multiple other queries to retrieve all necessary data and then sort and filter afterwards. That worked fine with a few records but not with 40.000.

Well I'm now trying to find a solution with TDEs and the optic api of marklogic. At first this looked great, but as soon as i started filtering and ordering I was not getting the expected results.

The results are so random, I don't understand what's going on. Directly after the document-insert all records are found. After a few minutes they are not anymore. If I order by a column I'm still getting all expected results. If I then order again by that column, but descending, i get zero results. This behaviour seems to change randomly.

I've checked if marklogic finished indexing.

I've created an element-range-index for each column that's filtered/sorted.

    {
      "localname": "identifier",
      "scalar-type": "string",
      "namespace-uri": "",
      "collation": "http://marklogic.com/collation/en/S1",
      "range-value-positions": false,
      "invalid-values": "ignore"
    }

Also I'd like to find a better way for performing a like-condition. The script is mostly dynamic, that means with the current solution I have to check myself, if the user input (filters) are fine to use. Filtering with cts:element-word-query worked better, but the results were unfiltered which might be a problem of missing indicies, but I'm not sure if I have to had additional ones.

Simplified xquery script:

xquery version "1.0-ml";
import module namespace op = "http://marklogic.com/optic" at "/MarkLogic/optic.xqy";

let $smlawsview := op:from-view("main", "smlaws")
let $sfpublicationchannelsview := op:from-view("main", "sfpublicationchannels")
let $sfstatusview := op:from-view("main", "sfstatus")
let $smlawareasview := op:from-view("main", "smlawareas")

let $maps := $smlawsview
let $maps := op:where($maps, op:sql-condition(fn:concat("smlaws.identifier LIKE '%identifier%'")))
let $maps := op:join-inner($maps, $sfpublicationchannelsview, op:on("fksfpublicationchannels", op:view-col("sfpublicationchannels", "pksfpublicationchannels")))
let $maps := op:where($maps, op:eq(op:view-col("sfpublicationchannels", "fksfstatus"), "1"))
let $maps := op:join-inner($maps, $sfstatusview, op:on(op:view-col("sfpublicationchannels", "fksfstatus"), op:view-col("sfstatus", "pksfstatus")))
let $maps := op:join-inner($maps, $smlawareasview, op:on(op:view-col("smlaws", "fksmlawareas"), op:view-col("smlawareas", "pksmlawareas")))

let $maps := op:select($maps, (
    op:view-col("smlaws", "pksmlaws"),
    op:view-col("smlaws", "fksfpublicationchannels"),
    op:view-col("smlaws", "fksmlawareas"),
    op:view-col("smlaws", "identifier"),
    op:view-col("sfpublicationchannels", "fksfstatus"),
    op:view-col("sfstatus", "pksfstatus"),
    op:view-col("sfstatus", "identifier"),
    op:view-col("smlawareas", "pksmlawareas"),
    op:view-col("smlawareas", "identifier")
))

let $maps := op:order-by($maps, op:asc(op:view-col("smlaws", "identifier")))

let $maps := op:offset-limit($maps, op:param("offset"), op:param("limit"))

let $maps := op:result($maps, (),
        map:entry("offset", 0)
        => map:with("limit", 10)
)

return for $map in $maps
return element row {
    element id {map:get($map, "main.smlaws.pksmlaws")},
    element fksfpublicationchannels {map:get($map, "main.smlaws.fksfpublicationchannels")},
    element identifier {map:get($map, "main.smlaws.identifier")},
    element status {map:get($map, "main.sfstatus.pksfstatus")},
    element fksfstatus {map:get($map, "main.sfstatus.identifier")},
    element lawarea {map:get($map, "main.smlawareas.pksmlawareas")},
    element fksmlawareas {map:get($map, "main.smlawareas.identifier")}
}


Example documents:

<smlaws>
    <id>1</id>
    <identifier>law 1</identifier>
    <fksfpublicationchannels>1</fksfpublicationchannels>
    <fksmlawareas>1</fksmlawareas>
</smlaws>
<sfpublicationchannels>
    <id>1</id>
    <identifier>publicationchannel 1</identifier>
    <fksfstatus>1</fksfstatus>
</sfpublicationchannels>
<sfstatus>
    <id>1</id>
    <identifier>status 1</identifier>
</sfstatus>
<smlawareas>
    <id>1</id>
    <identifier>lawarea 1</identifier>
</smlawareas>

Example tde of smlaws:

<tde:template xmlns:tde='http://marklogic.com/xdmp/tde' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://marklogic.com/xdmp/tde '>
  <tde:description>TDE view for smlaws</tde:description>
  <tde:context>/envelope/instance/smlaws</tde:context>
  <tde:enabled>true</tde:enabled>
  <tde:rows>
    <tde:row>
      <tde:schema-name>main</tde:schema-name>
      <tde:view-name>smlaws</tde:view-name>
      <tde:columns>
        <tde:column>
          <tde:name>pksmlaws</tde:name>
          <tde:scalar-type>int</tde:scalar-type>
          <tde:val>id</tde:val>
          <tde:nullable>false</tde:nullable>
          <tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
        </tde:column>
        <tde:column>
          <tde:name>fksmlawareas</tde:name>
          <tde:scalar-type>int</tde:scalar-type>
          <tde:val>fksmlawareas</tde:val>
          <tde:nullable>true</tde:nullable>
          <tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
        </tde:column>
        <tde:column>
          <tde:name>identifier</tde:name>
          <tde:scalar-type>string</tde:scalar-type>
          <tde:val>identifier</tde:val>
          <tde:nullable>true</tde:nullable>
          <tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
        </tde:column>
        <tde:column>
          <tde:name>fksfpublicationchannels</tde:name>
          <tde:scalar-type>int</tde:scalar-type>
          <tde:val>fksfpublicationchannels</tde:val>
          <tde:nullable>false</tde:nullable>
          <tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
        </tde:column>
      </tde:columns>
    </tde:row>
  </tde:rows>
</tde:template>

Questions:

  1. How can I find out what's wrong in my setup (be it indicies, tde's or xquery, ...) that I'm not getting proper results?
  2. Is there a better solution for what I'm trying to accomplish?
  3. Is there a better way of performing the like-condition?
1

There are 1 best solutions below

2
On

For the variability in the result set, the best step is likely to raise a support ticket to start an investigation.

For user-supplied filtering criteria, consider

op:where(op:sql-condition(...criteria...))

as described here: https://docs.marklogic.com/op:sql-condition

If the filtering criteria can be applied prior to some of the joins, the query might perform better by the following sequence of operations:

  1. join as need for the input to the filter
  2. apply the filter
  3. order on the filtered rows
  4. limit on the ordered rows
  5. do the rest of the joins
  6. order the final output

By joining a smaller data set, that reduces the work. If the columns used for filtering could be denormalized into the source documents for the smlaws view so filtering could happen before any joins, that would minimize the number of joins.

Hoping that helps,