In case I have a table partitioned by year; how do I avoid the scanning of all partitions when I have to lookup a row by its ID and can't use partition pruning in the lookup query?
CREATE TABLE part_table (
id bigint NOT NULL auto_increment,
moment datetime NOT NULL,
KEY (id),
KEY (moment)
)-- partitioning information (in years)
PARTITION BY RANGE( YEAR(moment) ) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION pFuture VALUES LESS THAN (maxvalue) )
;
With e.g. lookup query:
SELECT * FROM part_table WHERE ID = <nr>
PRIMARY KEY(id, moment)
orPRIMARY KEY(moment, id)
instead ofINDEX(id)
?id=123
requires checkingINDEX(id)
in each partition. Herein lies one of the reasons why aPARTITIONed
table is sometimes slower than the equivalent non-partitioned table.Show us the main queries you have. I will probably explain why you should not partition the table. I see two possible benefits in your definition:
DELETEing
it.Some cases
For this discussion, I assuming partitioning by a datetime in some fashion (
BY RANGE(TO_DAYS(moment))
orBY ... (YEAR(moment))
, etc).Partitioning probably hurts slightly because, regardless of what indexes are available, the query must look in every partition.
This is a case where partition "pruning" is beneficial. It will look in one or two partitions (depending on whether or not the query is being run in January). Then it will somewhat efficiently use the index to lookup by
id
.Now let be discuss two flavors if an index starting with
id
(and assuming either of theWHERE
clauses, above:The PK is "clustered" with the data. That is, the data is sorted by first
id
thenmoment
. Hence theid BETWEEN...
will find the rows consecutively in the BTree -- this is the most efficient. TheAND moment...
works to filter out some of the rows.is not "clustered". It is a secondary index. Secondary indexes take two steps. (1) search the secondary BTree for the ids, but without filtering by
moment
; (2) reach into the data BTree using the artificial PK that was provided for you; (3) now the filtering bymoment
can happen. More steps, more blocks to read, etc.id much faster and less invasive than `DELETE .. WHERE moment < '2021-01-01'.
More
It is important to look at all the main queries.
X=constant
versusX BETWEEN...
can make a big difference in optimization; please provide concrete examples that are realistic for your app.Also, sometimes a "covering" index can make up for otherwise inefficient indexes. So those examples need to show all the columns in the important queries. And what datatypes they are.
In the absence of such details, I will make the following broad statements (which might be invalidated by the specifics):
WHERE
references only one column, thePARTITIONing
is probably never beneficial.WHERE
has one=
test and one 'range' test, there is probably a composite index that will work much better than partitioning.PRIMARY KEY
.WHERE
cannot use some index, that implies a scan of the partition. If there are only a few partitions, that could be a big scan.