I'm looking for a way to create a static index to serve filtering/sorting queries for a combination of property values in the nested structure (a collection of objects) along with the structure-container. It seems to be not trivial due to the following reasons:
- If properties of the nested structure/collection are separated out into individual fields of the index (individual collections), then it makes impossible using the
AND
condition when filtering on 2+ properties of the nested structure/collection. - The fanout index complexity (see an example), which makes any solution to run too slow.
Given the following persistent model:
public class Document
{
public string Title { get; set; }
public List<UserChange> RecentModifications { get; set; }
}
where
public class UserChange
{
public string UserId { get; set; }
public DateTime Timestamp { get; set; }
}
Question: How to build the index for Document
to filter/sort by a combination of all the fields: Title
, UserId
and Timestamp
?
Possible use cases:
- get all documents containing the word 'contract' for a certain user and a date range
- sort documents containing the word 'contract' by last modification made by the user.
P.S. I understand that the indexing restrictions can by bypassed by restructuring the persistence model - storing a structure for recently modified documents in the User
document, but it'd impose some other limitations, which I'd like to avoid.
The problem can be solved by using Indexes with Dynamic Fields. It allows to keep logical data structure and avoids creating a fanout index.
Solution
Create the following index for the
Document
collection from the above:Query on
MyIndex
likeExplanation
The specified index with dynamic fields will generate extra fields and terms for each record in the following format:
The format is important, because when you use a dictionary in your high-level query like
myDic[key]
, it gets transformed tomyDic_key
in the generated RQL. Hence, it will allow us to use these fields in queries.If you query with using the usual
Query
rather thanDocumentQuery
(see docs), then you need a proper data type for LINQ to work. For that purpose I createdDocumentIndDto
class, where myRecentModifications
has become a dictionary, so I could use it in a high-level query and get correct RQL likeFor more details, see my discussion on the subject with Oren Eini (aka Ayende Rahien).