In my MySQL database I have this result querying my data:
mysql> select count(*) from emails where email like '%johndoe%';
+----------+
| count(*) |
+----------+
| 102 |
+----------+
1 row in set (15.50 sec)
My data is indexed under Sphinx (Manticore Search actually) with min_word_len = 1. Now, when I search with SphinxQL I get only partial results:
mysql> SELECT count(*) FROM search1 WHERE MATCH('@email johndoe') LIMIT 1000 OPTION max_matches=1000;
+----------+
| count(*) |
+----------+
| 16 |
+----------+
1 row in set (0.00 sec)
Any idea how to match the results MySQL give me? I tried SPH_MATCH_ANY and SPH_MATCH_EXTENDED with the sphinxapi, same results.
I suspect it's mainly due to whole word matching. Sphinx matches whole words. With 'words' defined as per charset_table http://sphinxsearch.com/docs/current/conf-charset-table.html
ie
MATCH('@email johndoe')is only matching addresses withjohndoein them as a word. default charset_table keeps.-and@(common in emails!) all as separators so would match[email protected]or[email protected], but NOT[email protected], the word being indexed ismyjohndoenotjohndoeWhereas mysql 'LIKE' will happy match part words. eg
email like '%johndoe%'would[email protected],[email protected]and[email protected]or whatever. Its a pure substring match.In short might want to tweak
charset_table. could.-and@all be word chars, so email would be whole word.alternatively might just enable part word matching with
min_infix_len. http://sphinxsearch.com/docs/current.html#conf-min-infix-lenthen could do
MATCH('@email *johndoe*')which would get much closer results.complementary to
min_infix_lenwould beexpand_keywordshttp://sphinxsearch.com/docs/current.html#conf-expand-keywordsthen the * wildcards would be added automatically, so could go back to
MATCH('@email johndoe')