Uncontrollable MySQL ibd datafile resizing

37 Views Asked by At

My mySQL database has quite a number of columns (aprox. 750). Besides mostly DECIMAL(13,2) also some short CHARs, INTs and aprox. 25 columns of CHAR(150). The number of all rows is approximately 27,000. As long as the values in the CHAR(150) columns are NULL, the size of the ibd file on the disk is about 270 MB, which I believe is a reasonable size given the mentioned number of records and columns (about 10kB per line). The problem arises when I start entering values into one of the CHAR(150) columns. After writing the value "1" into only one of the CHAR(150) columns in 1000 rows, the file size increases by 4096 kB, even though expected to increase by 150 kB (length CHAR(150) times 1000 = 150 kB). At first I thought that the MySQL engine increases the size of the file in larger chunks in order to optimize workload (e.g. one 4096 kB chunk for first 1000 rows, then for the next approx. 25 repetitions of writting in 25 x 1000 rows the size does not increase etc.). But unfortunately this is not the case. For every 1000 rows entered, the size increases by 4096 kB. So if I write the value "1" in one CHAR(150) column in all 27,000 rows, the file size increases from 270 MB to 1.07 GB, which is approximately 29 kB per individual line for the entered data, which in the case of CHAR(150) should represent only 150B per line (i.e. 4 MB alltogether).

What I am missing? And how to proceed to keep the file size within reasonable limits?

PLEASE NOTE: The main question is why the size of the file increases unpropotionaly according to data added. i.e: Writing 4 MB of data increases file by aprox. 800 MB.

MySQL version is 8.0, storage engine INNODB, sql-mode "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,PAD_CHAR_TO_FULL_LENGTH", innodb_file_per_table=1, character-set-server=utf8mb4, collation-server=utf8mb4_0900_as_cs

1

There are 1 best solutions below

1
Bill Karwin On

https://dev.mysql.com/doc/refman/8.0/en/innodb-tablespace-autoextend-size.html:

If the tablespace is more than 32 extents in size, it is extended four extents at a time.

An extent is 1MB (when the page size is the default 16KB). This is why the file increases by 4MB when you see it grow.

When your CHAR column is NULL, it takes no space. But as soon as you set one of your CHAR columns to a non-NULL value, that column takes as much space as the full length of the column. E.g. CHAR(150) takes 150 bytes.

Your table might have stored a lot of rows with NULLs on the same page. But if they suddenly require more storage per row, that'll require InnoDB to copy the set of rows from one page onto multiple pages, since it can't fit the same number of rows per page. To do this, it probably needs to expand the tablespace, and it does this in 4MB increments. If the tablespace had already been expanded, and had some unoccupied extents, it'll try to use those unoccupied pages before expanding the tablespace further.

Further, rows are not packed perfectly into pages anyway. For example even though a page is 16KB, you might think 16 rows of 1000 bytes each would store perfectly into one page, but this is not so. InnoDB has some extra bytes per page needed for organizing the data, linking to the next page, etc. And besides that, InnoDB reserves 1/16th of the page in case you update rows in the future. So it could take several pages to store 16 rows of 1000 bytes each.

InnoDB can't immediately recycle the pages storing the former copy of the rows either. It might need to preserve the former version of the rows, for other transactions to read until they commit. A background thread marks those old row versions deleted. But it might not have obsoleted every row on that page. Over time, a page may become only partially filled. Generally this is a good tradeoff, because storage is cheap and the advantages of MVCC are worth it.