I am trying to integrate NHibernate.Search into a multi-lingual website. Now, this website contains a class Article which is multilingual. This is done by having a seperate class - Article_CultureInfo which stores the language-specific content. Fields of Article are
Article
-------
ID
Name
And Article_CultureInfo are:
Article_CultureInfo
-------
ID
ArticleId
CultureCode
PageTitle
Content
I am using Nhibernate.Search.Mapping to map out the field/document information. I would like to incorporate search features like stemming and synonym analysis where possible based on the language. Is there any way the Lucene Analyser can be specified at run-time, not compile time / initialisation?
Say we are analysing the content of PageTitle which is to be stored in the respective Lucene index - This content can be English, French, Italian, etc based on the value of CultureCode. Thus, the analyser should change based on this value. I have tried implementing a custom MultilingualAnalyser, however the only data available to me are the string to be analysed, i.e the value of PageTitle. From that only, I cannot deduce the language. (I could look into language detection techniques but that is out of the scope since I already know specifically what it is, and would be overkill and not 100% reliable.)
If I were to have apart from the tokens, an instance of the object, I could be able to get the CultureCode value out of it, and analyse accordingly. Any ideas would be greatly appreciated - I really wish to avoid using Lucene.Net directly since NHibernate.Search looks to integrate very nicely.
Thanks!
I've basically done a work-around for this method - Quite an overkill but works.
I've created a new implementation of
IGetter, which is used for multilingual properties, which I calledMultilingualGetter. This is basically the same as theBasicGetter- I couldn't extend from it as for some reason it issealed, so I copied the code.What this
IGetterdoes is: When theGet()method is called on it, it is given thetargetobject. This is the instance of the class that contains the property. I check that it implements an interface for multilingual objects which I've created,IMultilingualContentInfo. It then retrieves the current culture from theIMultilingualContentInfo, and appends it on the front of the actual text, e.g [en]Hello World!.This text is then passed on to a custom analyzer I created which parses the culture as well, and can deduce what it is. It is then using a
SnowballFilterto stem the text based on the language.Below is the code for
Get()method of the customIGetterimplementation -IMultilingualContentInfo