The database for an application I manage use UUIDs. These UUIDs are stored as char(36)
with a utf8
character set. From a strict performance point of view, I know this is not optimal. The general recommendation seem to be to use a binary(16)
column for UUIDs. I cannot change the data type, but I can change the character set.
The characters in a UUID can be a digit 0 through 9, or letter a through f.
By changing the character set from utf8
to ascii
, the total size of all indexes for the database will probably be reduced by several gigabytes.
The application connects to the database and explicitly sets character encoding and connection collation in the connection string : characterEncoding=utf8&connectionCollation=utf8
.
What will I have to do (if anything at all) to ensure a safe "conversion" from utf8 to ascii for the UUIDs?
Standard UUIDs (as opposed to home-grown ones) use only hex, which is a subset of ascii characters which is a subset of utf8. The encoding (how the bits are arranged) is identical. Hence, no data loss in the conversion.
One thing to be careful of. If you are
JOINing
on a uuid, do make sure theCHARACTER SET
andCOLLATION
of the column in both tables is the same. Otherwise, there will be a big performance hit. (MySQL is not smart enough to understand that you have simply hex.)Also, check the collation -- if it is
..._ci
, then "a" == "A", etc. This would be beneficial if you might need case folding. (..._bin
treats a..f as different than A..F.)In InnoDB, with either
VARCHAR
orCHAR
, ascii or utf8mb4 with virtually any length of at least 36, works identically.Yes, do have a backup handy, just in case.
For large tables, UUIDs have an unavoidable performance problem. I discuss it here: http://mysql.rjweb.org/doc.php/uuid