Value-Query within an Element-Query

347 Views Asked by At

Search for Items that have an Overall Height of 4":

I have a question about using cts:search. Consider the following xml:

<Item Id="07123114-5c14-4ba9-a6ad-7b688feb8706" ...>
...
  <AttributeValue AttributeName="Mounting Application" AttributeGroup="Search_Application">Tank</AttributeValue>
  <AttributeValue AttributeName="Type" AttributeGroup="Search_Type">Pump Mounting Bracket</AttributeValue>
  <AttributeValue AttributeName="Overall Width" AttributeGroup="Search_Width">15/16 "</AttributeValue>
  <AttributeValue AttributeName="Overall Height" AttributeGroup="Search_Height">1-3/8 "</AttributeValue>
...
</Item>

Let's say that I want to look for Items that have Overall Height = 4"

I'm using the following query in cts:search:

cts:search(/tx:Item,
  cts:element-query(xs:QName("tx:AttributeValue"), cts:and-query((
    cts:element-attribute-value-query(xs:QName("tx:AttributeValue"), xs:QName("AttributeName"), "Overall Height"),
    cts:word-query("4 """, "exact"))))
)

This is giving me all Items that have an Overall Height of 4", or 1/4", or 3/4", and so on. This is because the word-query does a 'contains' search. But I want an exact value match. I cannot do element-value-query because it is wrapped in the element-query (the element-value is not a sub-element).

The two alternatives we currently have involves changing the xml structure: Option 1. Make the value an attribute of the AttributeValue element; Option 2. Make it a child-element.

I feel like there has to be a way to get what I'm looking for without changing the xml structure. Please advise.

2

There are 2 best solutions below

0
On

I'd change the XML: use AttributeName to generate meaningful element names.

MarkLogic assumes that your XML has meaningful element and attribute names. This XML structure looks like a perfectly good serialization format. But it's a poor query format because the element names aren't meaningful. It's like a relational database table with three columns: type, group, value. So every query has to join WHERE TYPE=? AND VALUE=?. It's much more efficient to look up values WHERE HEIGHT=?, so MarkLogic pushes you pretty strongly in that direction.

It's technically possible to find ways around this, but you're fighting with the tool. Instead try thinking of the XML as a model for how MarkLogic will build its indexes. When the XML isn't easy to query, change it.

0
On

Your fundamental problem is that tokenization will break on / and space and whatnot so "2/4" contains the word "4" which is what you are asking for.

You might be able to get there by creating a field with tokenizer overrides (ML7) so that / is a word token, and space and " are removed. Then you replace the word query with a field word query for the given field.

I do agree with Michael, however, that you would be doing yourself a favour by taking a harder look at the data modelling.