Benchmark MySQL with batch insert on multiple threads within same table

1.3k Views Asked by Trần Kim Dự At 02 July 2019 at 19:38

I want to test high-intensive write between InnoDB and MyRock engine of the MySQL database. For this purpose, I use sysbench to benchmark. My requirements are:

multiple threads concurrency write to the same table.
support batch insert (each insert transaction will insert bulk of records)

I check all pre-made tests of sysbench and I don't see any tests that satisfy my requirements.

oltp_write_only: supports multiple threads that write to the same table. But this test doesn't have bulk insert option.
bulk_insert: support multiple threads, but each thread writes to a different table.

Are there any pre-made sysbench tests satisfied my requirement? If not, can I find custom Lua scripts somewhere which already are done this?

(from Comment:)

CREATE TABLE IF NOT EXISTS `tableA` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT, 
    `user_id` VARCHAR(63) NOT NULL DEFAULT '', 
    `data` JSON NOT NULL DEFAULT '{}', 
    PRIMARY KEY (`id`), 
    UNIQUE INDEX `user_id_UNIQUE` (`user_id` ASC)
) ENGINE = InnoDB;

Original Q&A

There are 1 best solutions below

Rick James On 05 July 2019 at 16:04

(From a MySQL point of view...)

Toss id and the PK -- saves 8 bytes per row.
Promote UNIQUE(user_id) to PRIMARY KEY(user_id) -- might save 40 bytes per row (depends on LENGTH(user_id)).

Doing those will

Shrink the disk I/O needed (providing some speedup)
Eliminate one of the indexes (probably a significant part of the post-load processing)

Run OS monitoring tools to see what percentage of the I/O is being consumed. That is likely to be the limiting factor.

Benchmarking products are handy for limited situations. For your situation (and many others), it is best to build your product and time it.

Another thought...

What does the JSON look like? If the JSON has a simple structure (a consistent set of key:value pairs), then the disk footprint might be half as much (hence speed doubling) if you made individual columns. The processing to change from JSON to individual columns would be done in the client, which may (or may not) cancel out the savings I predict.

If the JSON is more complex, there still might be savings by pulling out "columns" that are always present.

If the JSON is "big", then compress it in the client, then write to a BLOB. This may shrink the disk footprint and network bandwidth by a factor of 3.

You mentioned 250GB for 250M rows? That's 1000 bytes/row. That means the JSON averages 700 bytes? (Note: there is overhead.) Compressing the JSON column into a BLOB would shrink to maybe 400 bytes/row total, hence only 100GB for 250M rows.

{"b": 100} takes about 10 bytes. If b could be stored in a 2-byte SMALLINT column, that would shrink the record considerably.

Another thing: If you promote user_id to PK, then this is worth considering: Use a file sort to sort the table by user_id before loading it. This is probably faster than INSERTing the rows 'randomly'. (If the data is already sorted, then this extra sort would be wasted.)

Benchmark MySQL with batch insert on multiple threads within same table

There are 1 best solutions below

Related Questions in MYSQL

Related Questions in MARIADB

Related Questions in BENCHMARKING

Related Questions in SYSBENCH

Trending Questions

Popular # Hahtags

Popular Questions