I need simple explanation of why my queries fail to bring the results i need.
Sphinx 2.0.8-id64-release (r3831) Here is what i have in sphinx.conf:
SELECT
trackid,
title,
artistname,
SUBSTRING(REPLACE(TRIM(`artist_name`), 'the ', ''),1,3) AS artistname_init
....
sql_field_string = title
sql_field_string = artistname
sql_field_string = artistname_init
Additional settings:
docinfo = extern
charset_type = utf-8
min_prefix_len = 1
enable_star = 1
expand_keywords= 0
charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z, A..Z->a..z, a..z
Query works. I index my data without problems. However i am failing to make sphinx bring any sensible results. I am using SphinxQL to query.
Example:
select
artistname, artistname_init from myindex
WHERE MATCH('@artistname_init ^t*')
GROUP BY artistname ORDER BY artistname_init ASC limit 0,10;
brings nothing related to the query. I've tried everything i could think of like:
MATCH('@artistname_init ^t*')
MATCH('@artistname_init[1] t')
MATCH('@artistname_init ^t$')
Can anyone please point where is my mistake and perhaps give me query that will work for my case? My target is to get results that follow this sorting order:
B (Single letter)
B-T (Single letter + non-alphabet sign after)
B as Blue (Single letter + space after)
Baccara (First letter of single word)
Bad Religion (First letter of several words)
The B (not counting "The ")
The B.Y.Z (Single letter + non-alphabet sign after not counting "The ")
The B 2 B (Single letter + space after not counting "The ")
The Boyzz (First letter of single word not counting "The ")
The Blue Boy (First letter of several words not counting "The ")
Or close to it.
There are a lot of moving parts in what you're trying to do, but I can at least answer the title portion of it. Sphinx offers field-level ranking factors to let you customize the WEIGHT() function – it should be much easier to order the matches the way you want, rather than trying to actually filter out entries that matched the query later than the 1st or 2nd word.
Here's an example, which will return all results with a word starting with "b", sorted by how early that word appears:
If you want to filter out other cases like "Several other words then B", I think I'd suggest doing that in your application. For example, if the fourth result has the keyword in the 3rd word, only return the first 3 results. That, or actually create a new field in Sphinx without the leading "The", and then add a numeric attribute to the index to show that a word was removed (you can use numeric attributes in your ranker expressions).
As for ranking "B-t" more highly than "Bat", I'm not sure if that's possible without somehow changing Sphinx's concept of alphabetical order.. You could try diving into the source code? ;)
One last note. For this particular kind of query, MySQL (I say MySQL because it's the common way of sourcing a Sphinx index) may actually work just as well. If you strip the leading "The", a B-tree index (which MySQL uses) is a perfectly good way of searching if you're sure you only want results where the query matches the beginning of the field. Sphinx's inverted indexes are kind of overkill for that sort of thing.