MySQL how to index a query that searches for a substring in column while filtering integer columns

611 Views Asked by At

I have a table with a billion+ rows. I have have the below query which I frequently execute:

SELECT SUM(price) FROM mytable WHERE domain IN ('com') AND url LIKE '%/shop%' AND date BETWEEN '2001-01-01' AND '2007-01-01';

Where domain is varchar(10) and url is varchar(255) and price is float. I understand that any query with %..% will not use any index. So logically, I created an index on price domain and date:

create index price_date on mytable(price, domain, date)

The problem here persists, this index is also not used because query contains: url LIKE '%.com/shop%'

On the other hand a FULLTEXT index still will not work since I have other non text filters in the query.

How can I optimise the above query? I have too many rows not to use an index.

UPDATE

Is this an sql limit? could such a query provide better performance on a noSQL database?

2

There are 2 best solutions below

5
On

You have two range conditions, one uses IN() and the other uses BETWEEN. The best you can hope is that the condition on the first column of the index uses the index to examine rows, and the condition on the second column of the index uses index condition pushdown to make the storage engine do some pre-filtering.

Then it's up to you to choose which column should be the first column in the index, based on how well each condition would narrow down the search. If your condition on date is more likely to reduce the set of examined rows, then put that first in the index definition.

The order of terms in the WHERE clause does not have to match the order of columns in the index.

MySQL does not support optimizing with both a fulltext index and a B-tree index on the same table reference in the same query.

You can't use a fulltext index anyway for the pattern you are searching for. Fulltext indexes don't allow searches for punctuation characters, only words.

0
On

I vote for this order:

INDEX(domain,       -- first because of "="
      date,         -- then range
      url, price)   -- "covering"

but, since the constants look like most of the billion rows would be hit, I don't expect good performance.

If this is a common query and/or "shop" is one of only a few possible filters, we can discuss whether a summary table would be useful.