Our DB contains a lot of entries with comma in their titles (as in Hungarian, it's the decimal separator instead of period) and would like to match those with the right relevance. The search SQL looks like this currently in case of the user inputted terms are 7,5x20 otherTerm
SELECT (MATCH(title) AGAINST('(+7,5x20* +otherTerm* ) (7,5x20* otherTerm* ) (+7,5x20 +otherTerm )' IN BOOLEAN MODE)) AS Relevance,
id, title, product_id FROM versions
WHERE (MATCH(title) AGAINST('(+7,5x20* +otherTerm* ) (7,5x20* otherTerm* ) (+7,5x20 +otherTerm )' IN BOOLEAN MODE))
ORDER BY Relevance DESC LIMIT 50
Now the result order gives a higher relevance for eg. 5x20 than 7,5x20 so there has to be some kind of character escaping has to be done on the comma, preventing MySQL to handle them as separated strings. I didn't find the right one.
Thanks for any help in advance.
Edit: disassembling title into more digestible data is currently not an option. Really looking for solution escaping the comma or replacing it with 'match any single character' operator like dot in regex.
FULLTEXT
indexing is not designed to handle numbers, regardless of the Locale for the numbers.One approach is to alter the incoming text to replace punctuation that you want to treat as "letters" into, say,
_
. (And build a separate column for storing this altered text into. Then add theFULLTEXT
index to it instead of the "real" text.)Please note that
+x
will fail in a bad way -- one character strings are not indexed, so they cannot be found. So including strings that are two short leads to returning zero matches.Alterations to the saved text (eg,
7_5x20
) needs to be applied to the search, too.50K rows? Write a special, one-time, script to perform the above transformation to the existing 50K rows. Then incorporate the transformations into both the INSERTs and the SELECTs.