cts:search is returning unexpected results for unfiltered search option when wildcard is enabled

159 Views Asked by At

I am performing cts:search with "unfiltered" option and wildcard search is enabled (means passing "wildcarded").

In my database, I have inserted 5 xml documents which I have pasted below.

In the below cts:query if the value of journalTitle element contains a wildcard(*) it's returning me all the 5 documents.

for example: "d*", "di*", "dixi*"

Even if I am passing "mohi*t" as a value for journalTitle element, I am getting all the five documents in the result.

For "filtered" option it is working fine.

I am curious about why this behaviour? and please also let me know how can I correct this for "unfiltered" option.

I have searched a lot on google regarding this but did not find the solution.

Please find below the cts:search query and xml files

cts:query

cts:search(fn:collection(), cts:element-query(
        xs:QName("root"), 
        cts:and-query(
          (
            cts:element-value-query(xs:QName("sourceType"), "JA", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1), 
            cts:element-value-query(xs:QName("journalTitle"), "mohi*t", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1), 
            cts:element-value-query(xs:QName("title"), "title1", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1), 
            cts:element-value-query(xs:QName("volume"), "volume0", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1)
          ), 
          ()
        ), 
        ()
       ),"unfiltered")

XML content - pasted all the five xmls:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <journalTitle>Dinesh</journalTitle>
    <sourceType>JA</sourceType>
    <title>title1</title>
    <volume>volume0</volume>
</root>
-
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <journalTitle>Dixit</journalTitle>
    <sourceType>JA</sourceType>
    <title>title1</title>
    <volume>volume0</volume>
</root>
-
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <journalTitle>Prashant</journalTitle>
    <sourceType>JA</sourceType>
    <title>title1</title>
    <volume>volume0</volume>
</root>
-
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <journalTitle>GAYARI</journalTitle>
    <sourceType>JA</sourceType>
    <title>title1</title>
    <volume>volume0</volume>
</root>
-
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <journalTitle>KEVAL</journalTitle>
    <sourceType>JA</sourceType>
    <title>title1</title>
    <volume>volume0</volume>
</root>

You might need the xdmp:plan result so I have pasted it below

xdmp:plan result:

<qry:query-plan xmlns:qry="http://marklogic.com/cts/query">
    <qry:info-trace>xdmp:eval("xdmp:plan(cts:search(fn:collection(), cts:element-query(&amp;#10;   ...", (), &lt;options xmlns="xdmp:eval"&gt;&lt;database&gt;12874763000056740838&lt;/database&gt;&lt;root&gt;C:\RSuite\modules...&lt;/options&gt;)</qry:info-trace>
    <qry:info-trace>Analyzing path for search: fn:collection()</qry:info-trace>
    <qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace>
    <qry:info-trace>Path is fully searchable.</qry:info-trace>
    <qry:info-trace>Gathering constraints.</qry:info-trace>
    <qry:info-trace>Search query contributed 1 constraint: cts:element-query(fn:QName("", "root"), cts:and-query((cts:element-value-query(fn:QName("", "sourceType"), "JA", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1), cts:element-value-query(fn:QName("", "journalTitle"), "mohi*t", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1), cts:element-value-query(fn:QName("", "title"), "title1", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1), cts:element-value-query(fn:QName("", "volume"), "volume0", ("case-insensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","wildcarded","lang=en"), 1)), ()), ())</qry:info-trace>
    <qry:partial-plan>
        <qry:or-two-queries>
            <qry:element-query>
                <qry:key>10866465315185201428</qry:key>
                <qry:annotation>element(root)</qry:annotation>
                <qry:and-query>
                    <qry:term-query weight="1">
                        <qry:key>15329831187071590131</qry:key>
                        <qry:annotation>element(sourceType,value("JA"))</qry:annotation>
                    </qry:term-query>
                    <qry:term-query weight="0">
                        <qry:key>3029765743981997321</qry:key>
                        <qry:annotation>element(journalTitle)</qry:annotation>
                    </qry:term-query>
                    <qry:term-query weight="1">
                        <qry:key>4206353216190327061</qry:key>
                        <qry:annotation>element(title,value("title1"))</qry:annotation>
                    </qry:term-query>
                    <qry:term-query weight="1">
                        <qry:key>7729558342335907080</qry:key>
                        <qry:annotation>element(volume,value("volume0"))</qry:annotation>
                    </qry:term-query>
                </qry:and-query>
            </qry:element-query>
            <qry:and-two-queries>
                <qry:term-query weight="0">
                    <qry:key>837267169796541076</qry:key>
                    <qry:annotation>link-child(descendant(element(root)))</qry:annotation>
                </qry:term-query>
                <qry:and-query>
                    <qry:term-query weight="1">
                        <qry:key>15329831187071590131</qry:key>
                        <qry:annotation>element(sourceType,value("JA"))</qry:annotation>
                    </qry:term-query>
                    <qry:term-query weight="0">
                        <qry:key>3029765743981997321</qry:key>
                        <qry:annotation>element(journalTitle)</qry:annotation>
                    </qry:term-query>
                    <qry:term-query weight="1">
                        <qry:key>4206353216190327061</qry:key>
                        <qry:annotation>element(title,value("title1"))</qry:annotation>
                    </qry:term-query>
                    <qry:term-query weight="1">
                        <qry:key>7729558342335907080</qry:key>
                        <qry:annotation>element(volume,value("volume0"))</qry:annotation>
                    </qry:term-query>
                </qry:and-query>
            </qry:and-two-queries>
        </qry:or-two-queries>
    </qry:partial-plan>
    <qry:info-trace>Executing search.</qry:info-trace>
    <qry:final-plan>
        <qry:and-query>
            <qry:or-two-queries>
                <qry:element-query>
                    <qry:key>10866465315185201428</qry:key>
                    <qry:annotation>element(root)</qry:annotation>
                    <qry:and-query>
                        <qry:term-query weight="1">
                            <qry:key>15329831187071590131</qry:key>
                            <qry:annotation>element(sourceType,value("JA"))</qry:annotation>
                        </qry:term-query>
                        <qry:term-query weight="0">
                            <qry:key>3029765743981997321</qry:key>
                            <qry:annotation>element(journalTitle)</qry:annotation>
                        </qry:term-query>
                        <qry:term-query weight="1">
                            <qry:key>4206353216190327061</qry:key>
                            <qry:annotation>element(title,value("title1"))</qry:annotation>
                        </qry:term-query>
                        <qry:term-query weight="1">
                            <qry:key>7729558342335907080</qry:key>
                            <qry:annotation>element(volume,value("volume0"))</qry:annotation>
                        </qry:term-query>
                    </qry:and-query>
                </qry:element-query>
                <qry:and-two-queries>
                    <qry:term-query weight="0">
                        <qry:key>837267169796541076</qry:key>
                        <qry:annotation>link-child(descendant(element(root)))</qry:annotation>
                    </qry:term-query>
                    <qry:and-query>
                        <qry:term-query weight="1">
                            <qry:key>15329831187071590131</qry:key>
                            <qry:annotation>element(sourceType,value("JA"))</qry:annotation>
                        </qry:term-query>
                        <qry:term-query weight="0">
                            <qry:key>3029765743981997321</qry:key>
                            <qry:annotation>element(journalTitle)</qry:annotation>
                        </qry:term-query>
                        <qry:term-query weight="1">
                            <qry:key>4206353216190327061</qry:key>
                            <qry:annotation>element(title,value("title1"))</qry:annotation>
                        </qry:term-query>
                        <qry:term-query weight="1">
                            <qry:key>7729558342335907080</qry:key>
                            <qry:annotation>element(volume,value("volume0"))</qry:annotation>
                        </qry:term-query>
                    </qry:and-query>
                </qry:and-two-queries>
            </qry:or-two-queries>
        </qry:and-query>
    </qry:final-plan>
    <qry:info-trace>Selected 5 fragments</qry:info-trace>
    <qry:result estimate="5"/>
</qry:query-plan>

Apology if there is any gramatical error.

If you need more info please let me know.

2

There are 2 best solutions below

1
On

Wildcard searches rely on either the appropriate index, or filtering. Did you check you have enabled fast element trailing wildcard searches, and maybe also trailing wildcard searches on your database? That will work for patterns with at least 4 starting characters. For three starting characters, you also need to enable fast element character searches, and maybe also the three character searches.

MarkLogic also allows accurate unfiltered wildcard searches for patterns that start with just two or one character. One way is to enable the two character searches and one character searches options, but according to the documentation, you don't need to if you enable the three character one in combination with a word lexicon:

two character searches specifies whether indexes should be created to enable wildcard searches where the search pattern contains two consecutive non-wildcard character (for example, ab*). This index is not needed if you have three character searches and a word lexicon.

one character searches specifies whether indexes should be created to enable wildcard searches where the search pattern contains a single non-wildcard character (for example, a*). This index is not needed if you have three character searches and a word lexicon.

(source: Admin UI Help tab)

Thnx to Dave for pointing to Understanding the Wildcard Indexes in which all is explained in further detail.

HTH!

0
On

It is a better idea to enable a word lexicon with a codepoint collation in conjunction with the three character wildcards. The one and two character indexes are very very expensive.