Currently I'm testing out TokuDB and I'm very impressed. At this moment the inserts per second have peaked to just over 50.000 per second with two simultanious jobs running. The average insert rate is between 38.000 and 42.000 inserts per second.
I would like to go even higher, 100.000 inserts per second, as I will need to insert 1.2 billion calculated rows for now and about 6 billion more in the near future. I would like some advise on how to achieve this :-)
My current setup:
- Hardware: VPS with 4GB of RAM, 150GB SSD, 2 cores: Intel Westmere E56xx/L56xx/X56xx (Nehalem-C) 2.59GHz CPU
- Disk mount options: defaults,noatime
- OS: CentOS 6.8 64bit
- Database: Percona Server 5.7.14-8
My.cnf settings:
# TokuDB #
tokudb_cache_size = 2G
tokudb_commit_sync = 0
tokudb_fsync_log_period = 1000
TokuDB table layout:
CREATE TABLE `t1` (
`id` int(15) NOT NULL AUTO_INCREMENT,
`m_id` int(11) NOT NULL,
`c1` decimal(6,2) DEFAULT NULL,
`c2` decimal(6,2) DEFAULT NULL,
`c3` decimal(6,2) DEFAULT NULL,
`c4` decimal(6,2) DEFAULT NULL,
`c5` decimal(6,2) DEFAULT NULL,
`c6` decimal(6,2) DEFAULT NULL,
`c7` decimal(6,2) DEFAULT NULL,
`factor` decimal(4,2) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1
CREATE TABLE `t2` (
`id` int(15) NOT NULL AUTO_INCREMENT,
`v_id` int(15) NOT NULL,
`pid` int(11) DEFAULT NULL,
`amount` decimal(6,2) DEFAULT NULL,
`unit` int(1) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1
I'm aware of the fact that I'm not using any indexes other then the primary key index. This is due to the negative time impact the keys will have on inserting. A cluster key for each table will be created at the end of the insert job.
Additional MySQL commandline option:
SET unique_checks=OFF;
Somehow I'm not able to get this in the my.cnf.. If someone would know how then this would be greatly appreciated (currently unique_checks = off will block MySQL from starting due a unkown variable in the my.cnf).
The SQL statements are grouped in batches of 15.000. A PHP script generates the SQL statements and sends the query via mysqli_multiquery to the MySQL server:
<?PHP
foreach (generateCombinations($Arr) as $c) {
$QueryBatch[] = "insert into t1 values (NULL"
. ", " . $record->id
. ", " . rand(1, 35)
. ", " . rand(1, 140)
. ", " . rand(1, 20)
. ", NULL"
. ", " . rand(1, 14)
. ", " . rand(1, 300)
. ", " . rand(1, 4)
. ", NULL );";
$QueryBatch[] = "SET @t1id = LAST_INSERT_ID();";
$cntBatch++;
$pquery = array();
foreach ( $c as $key => $pid){
if ( is_null($pid) )
continue;
$pquery[] = "(NULL, @t1id, " . $pid . ", " . rand(1, 800) . ", 0)";
$cntBatch++;
}
$QueryBatch[] = "insert into t2 values " . implode(',', $pquery) . ";";
if ($cntBatch > 15000) {
$query = implode($QueryBatch);
if ( $mysqli->multi_query($query) ){
while ($mysqli->next_result()) {;}
} else {
printf("Errormessage: %s\n", $mysqli->error);
echo $query . "\n";
}
$cntBatch = 0;
unset($QueryBatch);
}
}
?>
Example of SQL insert statement:
insert into t1 values (NULL, 1 , 30, 100, 15, NULL, 10, 250, 2, NULL );
SET @t1id = LAST_INSERT_ID();
insert into t2 values (NULL, @t1id, 1, 750, 0),(NULL, @t1id, 1, 600, 0),(NULL, @t1id, 1, 500, 0),(NULL, @t1id, 1, 400, 0),(NULL, @t1id, 1, 300, 0),(NULL, @t1id, 1, 200, 0),(NULL, @t1id, 1, 100, 0);
insert into t1 values (NULL, 2 , 25, 95, 14, NULL, 11, 200, 3, NULL );
SET @t1id = LAST_INSERT_ID();
insert into t2 values (NULL, @t1id, 1, 600, 0),(NULL, @t1id, 1, 100, 0),(NULL, @t1id, 1, 300, 0),(NULL, @t1id, 1, 443, 0),(NULL, @t1id, 1, 521, 0),(NULL, @t1id, 1, 213, 0),(NULL, @t1id, 1, 433, 0);
[.. At least 14982 more..]
If it were me, I would cut down the number of statements being executed, and cut down the number of commits. I'm assuming that AUTO_COMMIT is enabled, given that we don't see any
BEGIN TRANSACTIONorCOMMITstatements.That's a whole bloatload of individual
INSERTandSETstatements. At least the inserts to the child table are using the multiple row insert, not separate insert statements for each row.If I needed this to be fast, I would
idvalues for thet1table, and include those in the INSERT statementLAST_INSERT_ID()t1(rather than separate INSERT statement for each row)BEGIN TRANSACTIONandCOMMITt1(serialize) to avoid potential contention for locksIf it were for InnoDB, I'd also do
SET FOREIGN_KEY_CHECKS=0.There's already a boatload of calls to the
randfunction in the code; so incrementing an integeridfort1isn't going to move the needle. When we start, we'd need a query to get the current AUTO_INCREMENT value, or get the MAX(id), whichever...Basically, I'd cut down the number of statements being executed, and get more work done by each statement, and more work down before each
COMMIT.Inserting ten (10)
t1rows per statement would significantly reduce the number of statements that need to be executed.LOAD DATA INFILE
Any discussion of performance of inserts would be incomplete without at least mentioning
LOAD DATA INFILE.For best performance, that can't be beat. But since we don't have the data in a file, and we don't have key values (needed for the foreign key in
t2, and we've got all the calls to rand to generate the data,LOAD DATA INFILEdoesn't seem to be a good fit.