I am using Mysql database.
I have a table daily_price_history
of stock values stored with the following fields. It has 11 million+
rows
id
symbolName
symbolId
volume
high
low
open
datetime
close
So for each stock SymbolName
there are various daily stock values. And the data is now more than 11 million rows,
The following sql try to get the last 100 days of daily data for a set of 1500 symbols
SELECT `daily_price_history`.`id`,
`daily_price_history`.`symbolId_id`,
`daily_price_history`.`volume`,
`daily_price_history`.`close`
FROM `daily_price_history`
WHERE (`daily_price_history`.`id` IN
(SELECT U0.`id`
FROM `daily_price_history` U0
WHERE (U0.`symbolName` = `daily_price_history`.`symbolName`
AND U0.`datetime` >= 1598471533546))
AND `daily_price_history`.`symbolName` IN (A,AA, ...... 1500 symbols Names)
I have the table indexed on symbolName
and also datetime
For getting 130K (i.e 1500 x 100 ~ 150000) rows of data it takes 20 secs.
Also i have weekly_price_history
and monthly_price_history
tables, and I try to run the similar sql, they take less time for the same number (130K) of rows, because they have less data in the table than daily.
weekly_price_history
getting 150K
rows takes 3s
. The total number of rows in it are 2.5million
monthly_price_history
getting 150K
rows takes 1s
. The total number of rows in it are 800K
So how to speed up the thing when the size of table is large.
When you add
INDEX(a,b)
, removeINDEX(a)
as being no longer necessary.Your dataset and query may be a case for using
PARTITIONing
.This will do "partition pruning":
datetime >= 1598471533546
. Then thePRIMARY KEY
will do most of the rest of the work forsymbolname in ('A', 'AA', ...)
.Aim for about 50 partitions; the exact number does not matter. Too many partitions may hurt performance; too few won't provide effective pruning.
Yes, get rid of the subquery as GMB suggests.
Meanwhile, it sounds like Django is getting in the way.
Some discussion of partitioning: http://mysql.rjweb.org/doc.php/partitionmaint