I have a MySQL DB that needs to be fast at scale.
Option 1 Tables can store the language ISO 639-3 code as a column: varchar(3) language
Option 2 Tables can store the ID for the language as a column: int(2?) language_id, and there can be a languages table with the ISO 639-3 code.
Question What makes sense for speed at scale? Option 1 is easier to read in the DB. I'd prefer it if speed is the same or completely negligible even at scale.
Thanks!
I recommend:
That will be 3 bytes, which is smaller than
INT
(4 bytes)and not much bigger than
SMALLINT UNSIGNED` (2 bytes).(Am I correct in saying that the codes are always 3 ascii letters? Hence no need for
VAR
, which takes an extra byte or two.)CHAR(3)
is readily indexable. There is no significant advantage in 'normalizing' even to smallint. This still applies even at the scale of a billion rows.And, as you point out, "easier to read" is worth something.
If you are also storing text, I assume that all such text can be mapped to UTF-8? If so, use
In MySQL, there is no problem having different columns in a single table using different charsets (or collations).
Perhaps worth noting... Many languages can be discovered from the hex utf-8 encoding:
-- http://mysql.rjweb.org/doc.php/charcoll#diagnosing_charset_issues