Solr PatternTokenizerFactory does not work with phrases

122 Views Asked by At

I can't get PatternTokenizerFactory to match multiple words at a time. If I use a simple expression such as "^keyword$" and search for "keyword", it will work

<tokenizer class="solr.PatternTokenizerFactory" pattern="^keyword$" group="0" />

"querystring":"keyword",    
"parsedquery":"(+DisjunctionMaxQuery(((title:keyword)^2.0)))/no_coord",

However the moment I include a space in the expression, it breaks. ie my expression is "^key.word$" and I search for "key word" it will not work

<tokenizer class="solr.PatternTokenizerFactory" pattern="^key.word$" group="0" />    

"querystring":"key word",
"parsedquery":"(+())/no_coord",

I can't figure out why this is not working. I am trying to match phrases built up from some clever regex, but can't figure out what's going on.

I've checked the regex in multiple testers and it works. Any help would be greatly appreciated.

I'm using Solr 6.1

1

There are 1 best solutions below

0
On

If you're using edismax, the sow (split on whitespace) parameter is true by default. This makes edismax split the content before running it through analysis.

The sow Parameter

Split on whitespace: if set to false, whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g. multi-word synonyms and shingles. Defaults to true: text analysis is invoked separately for each individual whitespace-separated term.