TokuDB select distinct awfully slow

147 Views Asked by At

I have a table using the TokuDB engine in MariaDB 10.2

The table has a currently about ~700 Million measurements for ~2500 sensors.

I want to get all the distinct sensor names in the table.

There is a index of this names.

Still, MariaDB seems to do a full table scan and needs ~5min.

Anything I can do to speed it up?

The table is:

CREATE TABLE `sensor_data_single` (
`sensor_id` varchar(255) CHARACTER SET latin1 NOT NULL,
`timestamp` bigint(20) NOT NULL,
`value_bool` bit(1) DEFAULT NULL,
`value_boolarray` tinyblob DEFAULT NULL,
`value_bytearray` tinyblob DEFAULT NULL,
`value_date` date DEFAULT NULL,
`value_time` int(11) DEFAULT NULL,
`value_instant` datetime(6) DEFAULT NULL,
`value_double` double DEFAULT NULL,
`value_enum_code` int(11) DEFAULT NULL,
`value_enum_text` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`value_int` int(11) DEFAULT NULL,
`value_long` bigint(20) DEFAULT NULL,
`value_string` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`value_type` int(11) DEFAULT NULL,
PRIMARY KEY (`sensor_id`,`timestamp`),
KEY `sensor_data_ts_key` (`timestamp`) `CLUSTERING`=YES,
KEY `sensor_data_id_key` (`sensor_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci `COMPRESSION`=TOKUDB_FAST

The query is a plain

select distinct sensor_id from sensor_data_single USE INDEX (sensor_data_id_key) order by sensor_id ASC;

Still, while the query runs there is State like Queried about 513770000 rows in SHOW PROCESSLIST

The explain says the index is used, but shouldn't this be much faster considering there are only ~2500 distinct values???

+------+-------------+--------------+-------+---------------+--------------------+---------+------+-----------+-------------+
| id   | select_type | table        | type  | possible_keys | key                | key_len | ref  | rows      | Extra       |
+------+-------------+--------------+-------+---------------+--------------------+---------+------+-----------+-------------+
|    1 | SIMPLE      | sensordata0_ | index | NULL          | sensor_data_id_key | 257     | NULL | 764382458 | Using index |
+------+-------------+--------------+-------+---------------+--------------------+---------+------+-----------+-------------+

If I drop the index on sensor_id, the explain changes to:

+------+-------------+--------------+-------+---------------+---------+---------+------+--------+--------------------------+
| id   | select_type | table        | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+------+-------------+--------------+-------+---------------+---------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | sensordata0_ | range | NULL          | PRIMARY | 257     | NULL | 175017 | Using index for group-by |
+------+-------------+--------------+-------+---------------+---------+---------+------+--------+--------------------------+

but performance doesn't change significantly.

0

There are 0 best solutions below