MySQL 5.7 character_set_client stack at utf8mb4

607 Views Asked by At

Long story short: we have a PHP-based self-developed CMS, originally on PHP5.x and MySQL, using a healthy combination of utf8 and iso-8859-1 char-sets (don't judge, I know it's weird but it's working). On our production environment our server provider upgraded to PHP7.2 and (after a few weeks of refactoring) everything works just fine.

Parallel to this production environment I've set up (or at least I tried to) a test environment for our development, VirtualBox Ubuntu 20.04, apache2.4, PHP7.2 and MySQL5.7.

in /etc/php/7.2/apache2/php.ini I have:

default_charset = "iso-8859-1"

in /etc/mysql/my.cnf I have:

[client]
default-character-set   = utf8


[mysqld_safe]
default-character-set   = utf8

[mysql]
default-character-set   = utf8


[mysqld]
init_connect                   = 'SET NAMES utf8'
character-set-client-handshake = false #force encoding to uft8
character-set-server           = utf8
collation-server               = utf8_unicode_ci

Now, on our development server the character_set_client=utf8mb4 and character_set_results=utf8mb4 and I can't find a way to change it.

The problem is, that when I try to import on our development server dumps from our production server (through our CMS), or when I try to save texts with special characters like ü or ä it always cuts the word at the occurrence and saves only the rest, e.g. instead of chüd will save only ch or instead of einträge it saves only eintr.

However I can save ü manually in DB without a problem (don't have to use ü)

(we have a second development server, Ubuntu 14.04, apache2.4, PHP5.6, MySQL5.7 and basically the same settings as on PHP7.2 testserver, and everything works fine)

Maybe PHP7.2 is doing the mess here, I am really out of ideas.

Any help will be appreciated. Thank you

1

There are 1 best solutions below

0
On

See "truncation" in Trouble with UTF-8 characters; what I see is not what I stored

I wonder if having apache not set to UTF-8 messes up <form>s.

init_connect = 'SET NAMES utf8' sets 3 CHARACTER_SET_% values if you are not connecting as "root". So, change it to utf8mb4 and do not connect as "root".

Are you sure about the encoding in the imported data? (I suspect this causes the truncation problem.) Can you get a hex dump of a small portion of the data.

For Western European languages, MySQL's utf8 and utf8mb4 work the same. That is, the init_connect that you have should be adequate _if the incoming data is really UTF-8, not iso...

For reference here are hex values:

char latin1 utf8
ä    E4     C3A4
ü    FC     C3BC