Can we have performance issues when using mixed table charsets in MySQL or in Postgres?

Question

Can we have performance issues when using mixed table charsets in MySQL or in Postgres?

496 Views Asked by sekmo At 17 August 2025 at 16:14

I'm working on a database that has tables with different charsets. Since it's a big database, I was wondering if it could lead to a performance issue. Yes, the usual value comparison a DB is usually doing is the JOIN and is done comparing integers, but are there any other performance problems we could experience having tables with different charsets, other than the bigger space taken by some charsets?

Original Q&A

There are 2 best solutions below

Rick James On 19 April 2021 at 22:27

MySQL:

For zip_code (postal_code), stored as a string (CHAR or VARCHAR), most charsets work equally well. However, when JOINing on such a column, the collation must be the same.

If it is the same, an index on that column can be used.
If it is not, then the index is useless, and the query must scan the entire table.

Since the collation includes the charset, that forces the charset to be the same, also.

The choice of collation is rather minor. However, if there can be letters in the string (postal_code, country_code, etc), you need to decide whether to force the tables (and user queries) to use a particular case.

Collation ..._bin treats cases as different: 'de' won't match 'DE' (for Germany).
Collation ..._ci is "Case Insensitive", so those would match.

**Bill Karwin** · Accepted Answer

If you do string comparisons with incompatible collations, those comparisons cannot use an index on the string column. I've seen this happen when doing a JOIN on a string column, and the tables joined had different collations (naturally if they also have different character sets, they are also different collations).

But you said your joins are on integer columns, not string columns. So joins shouldn't be a problem in your case.

You can also have performance problems when doing lookups against string columns if your table character set doesn't match your session character set.

Example: My table is defined with utf8mb4, but I set my session to utf8, so string literals will be utf8. Seems like a harmless change, right?

mysql> set names utf8;

mysql> explain select * from mytable where text = 'abc123';
+----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | mytable | NULL       | ref  | text          | text | 83      | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+-------+

I guess the utf8 string 'abc123' has a clear way to be promoted to utf8mb4 to match the column it compares to.

But if I force a specific collation that is not supported by utf8mb4, I see it has to do a table-scan and compare to rows one by one, instead of an indexed lookup:

mysql> explain select * from mytable where text = 'abc123' collate utf8_general_ci;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | mytable | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |   100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

There's a difference between implicit collations and explicit collations. Suppose I set my session to use something that doesn't have a clear path to utf8mb4:

mysql> set names latin1;

mysql> explain select * from mytable where text = 'abc123';
+----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | mytable | NULL       | ref  | text          | text | 83      | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+-------+

So far so good, but if I am explicit about the collation:

mysql> explain select * from mytable where text = 'abc123' collate latin1_general_ci;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (latin1_general_ci,EXPLICIT) for operation '='

The bottom line is that you should use the same character set and collation to make your life easier. Use it for all tables and for the session too.

In these modern times, it's hard to think of a reason to use anything other than utf8mb4.

P.S. Space shouldn't be a problem. UTF-8 character sets allow multibyte characters, but they don't expand the size of characters that fit in a single byte. UTF-8 is a variable-width character encoding. So characters in the ASCII range (0-127) are stored in one byte anyway. Read UTF-8 on wikipedia for details, it has a nice explanation.

Can we have performance issues when using mixed table charsets in MySQL or in Postgres?

There are 2 best solutions below

Related Questions in MYSQL

Related Questions in DATABASE

Related Questions in POSTGRESQL

Related Questions in CHARACTER-ENCODING

Related Questions in COLLATE

Trending Questions

Popular # Hahtags

Popular Questions