If one defines a field in RavenDB for fulltext search, it uses an analyzer which tokenizes the field and does post processing (source). If that field is now queried, what happens to the search term in the query? Is it also tokenized and post-processed? If yes, is it tokenized and post-processed by the same analyzer which is used during index time? Can the analyzer for indexing and for querying be different?
An example:
Collection:
{"Name": "xxxabcd", "@metadata": {"@collection": "Names"}}
{"Name": "yyyabcd", "@metadata": {"@collection": "Names"}}
Index:
from names in docs.Names
select new {
names.Name
}
Activate Indexing on the Name field to Search and use the NGram analyzer (no idea how to do this in RQL). NGram creates 2-6 character long tokens out of the name (source). So one token will be abcd which is shared by both documents.
Query:
from index "Names/ByName"
where search(Name, "xxxabcd")
The query returns no search results.
If the search term would be post-processed to an NGram of abcd, it would return both documents, but it does not. So what happends to the search term xxxabcd?
I can not find any documentation how search terms on full-text fields are handled.
Usually, the analyzer that is configured in the index definition is run both at indexing time and at query time (the same analyzer).
From:
https://ravendb.net/learn/inside-ravendb-book/reader/4.0/10-static-indexes-and-other-advanced-options#full-text-search-queries
After investigation:
NGram is an exception to the above rule.
x.Namein your case),the
StandardAnalyzeris used to tokenize the searched term (xxxabcd) from the query predicate.The default token length generated by the NGram analyzer is only
2-6 chars.So in your case, using your example, and the default NGram settings, the terms that are generated in the index from your 2 documents are:
Now, when you query with:
you will get 0 results because "xxxabcd" is Not passed via the NGram analyzer at query time,
but via the StandardAnalyzer instead.
Note again - this exception is only for the NGram analyzer.
For any other analyzer used - the same analyzer will be used at query time.
The way to go about this is either:
Increase the NGram terms length
(configuring
2-7 charswill provide you with results for search termxxxxabcd).To modify the max NGram value, update the configuration key
Indexing.Analyzers.NGram.MaxGram.This article explains how to modify a configuration key server-wide.
You can use the Edit Index view in the Studio to modify this configuration just for a specific field in a specific index.
Use another analyzer
For the sake of having a complete answer:
Index definition is:
Some other resources for FTS are:
Full-Text Search
https://ravendb.net/docs/article-page/6.0/csharp/client-api/session/querying/text-search/full-text-search
Full-Text Search with Index
https://ravendb.net/docs/article-page/6.0/csharp/indexes/querying/searching
Demos
https://demo.ravendb.net/demos/csharp/text-search/fts-with-static-index-single-field https://demo.ravendb.net/demos/csharp/text-search/fts-with-static-index-multiple-fields