Ignore special characters in Examine

1k Views Asked by VinnyG At 23 January 2014 at 15:33

In Umbraco, I use Examine to search in the website but the content is in french. Everything works fine except when I search for "Français" it's not the same result as "Francais". Is there a way to ignore those french characters? I try to find a FrenchAnalyser for Leucene/Examine but did not found anything. I use Fuzzy so it return results even if the words is not the same.

Here's the code of my search :

public static ISearchResults Search(string searchTerm)
        {
            var provider = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
            var criteria = provider.CreateSearchCriteria(BooleanOperation.Or);

            var crawl = criteria.GroupedOr(BoostedSearchableFields, searchTerm.Boost(15))
            .Or().GroupedOr(BoostedSearchableFields, searchTerm.Fuzzy(Fuzziness))
            .Or().GroupedOr(SearchableFields, searchTerm.Fuzzy(Fuzziness))
            .Not().Field("umbracoNavHide", "1");

            return provider.Search(crawl.Compile());
        }

Original Q&A

There are 3 best solutions below

Charles Ouellet On 22 May 2014 at 19:58 BEST ANSWER

We ended up using a custom analyer based on the SnowballAnalyzer

public class CustomAnalyzer : SnowballAnalyzer
{
    public CustomAnalyzer() : base("French") { }

    public override TokenStream TokenStream(string fieldName, TextReader reader)
    {
        TokenStream result = base.TokenStream(fieldName, reader);

        result = new ISOLatin1AccentFilter(result);

        return result;
    }
}

Israel Ocbina On 26 May 2014 at 05:26

Try using Regex like this below:

var strInput ="Français";
var strToReplace = string.Empty;
var sNewString = Regex.Replace(strInput, "[^A-Za-z0-9]", strToReplace);

I've used this pattern "[^A-Za-z0-9]" to replace all non-alphanumeric string with a blank.

Hope it helps.

Michael Argentini On 13 December 2022 at 17:46

You can actually convert the unicode characters with diacritics to english equivalents using the following method. That will enable you to search for "Français" with the search term "Francais".

public static string RemoveDiacritics(this string text)
{
    if (string.IsNullOrWhiteSpace(text))
        return text;

    text = text.Normalize(NormalizationForm.FormD);
    var chars = text.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark).ToArray();

    return new string(chars).Normalize(NormalizationForm.FormC);
}

Use it on any string like this:

var converted = unicodeString.RemoveDiacritics();

Ignore special characters in Examine

There are 3 best solutions below

Related Questions in C#

Related Questions in UMBRACO

Related Questions in LUCENE.NET

Related Questions in EXAMINE

Trending Questions

Popular # Hahtags

Popular Questions